scispace - formally typeset
Search or ask a question
Author

Sung-Hyon Myaeng

Bio: Sung-Hyon Myaeng is an academic researcher from KAIST. The author has contributed to research in topics: Web query classification & Information extraction. The author has an hindex of 24, co-authored 175 publications receiving 3076 citations. Previous affiliations of Sung-Hyon Myaeng include Chungnam National University & Information and Communications University.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposed two empirical heuristics: per-document text normalization and feature weighting method, which performed very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM.
Abstract: While naive Bayes is quite effective in various data mining tasks, it shows a disappointing result in the automatic text classification problem Based on the observation of naive Bayes for the natural language text, we found a serious problem in the parameter estimation process, which causes poor results in text classification domain In this paper, we propose two empirical heuristics: per-document text normalization and feature weighting method While these are somewhat ad hoc methods, our proposed naive Bayes text classifier performs very well in the standard benchmark collections, competing with state-of-the-art text classifiers based on a highly complex learning method such as SVM

430 citations

Proceedings ArticleDOI
01 Jul 2000
TL;DR: This paper proposes a practical method for enhancing both the speed and the quality of hypertext categorization using hyperlinks, and achieves up to 18.5% of improvement in effectiveness while reducing the processing time dramatically.
Abstract: As WWW grows at an increasing speed, a classifier targeted at hypertext has become in high demand. While document categorization is quite a mature, the issue of utilizing hypertext structure and hyperlinks has been relatively unexplored. In this paper, we propose a practical method for enhancing both the speed and the quality of hypertext categorization using hyperlinks. In comparison against a recently proposed technique that appears to be the only one of the kind, we obtained up to 18.5% of improvement in effectiveness while reducing the processing time dramatically. We attempt to explain through experiments what factors contribute to the improvement.

183 citations

Patent
17 Feb 2010
TL;DR: In this paper, a query expansion-based information retrieval method using query/document topic category transition analysis is proposed, in which a query input from a user is expanded using a topic-category transition analysis result, and corresponding information or documents are retrieved using the expanded query are provided.
Abstract: An information retrieval system and method, and more particularly, a query/document topic category transition analysis system and method in which a query topic category of a query input from a user as an information retrieval keyword and a document topic category of a document which a user regards as relevant and selects from information retrieval results are classified to analyze transition between the query topic category and the document topic category, and a query expansion-based information retrieval system and method using query/document topic category transition analysis in which a query input from a user is expanded using a topic category transition analysis result, and corresponding information or documents are retrieved using the expanded query are provided. The query expansion-based information retrieval method using query/document topic category transition analysis, includes: in a state in which a topic category transition map is generated as a result of analyzing topic category transition between a user query and a relevant document, and corresponding documents are generated as pseudo documents according to each topic category for the user query and the relevant document, determining a corresponding query topic category based on query/document text information for an input query input from a user; allocating a relevant document topic category for the classified query topic category based on the topic category transition map; ranking representative keywords for the query topic category and the relevant document topic category based on the pseudo documents; expanding the input query using the ranked representative keywords; and retrieving corresponding documents using the expanded query.

172 citations

Journal ArticleDOI
TL;DR: This study investigated how effectively cause-effect information can be extracted from newspaper text using a simple computational method (i.e. without knowledge-based inferencing and without full parsing of sentences).
Abstract: This study investigated how effectively cause-effect information can be extracted from newspaper text using a simple computational method (i.e. without knowledge-based inferencing and without full parsing of sentences). An automatic method was developed for identifying and extracting cause-effect information in Wall Street Journal text using linguistic clues and pattern matching. The set of linguistic patterns used for identifying causal relationships was based on a thorough review of the literature and on an analysis of sample sentences from the Wall Street Journal. The cause-effect information extracted using the method was compared with that identified by two human judges. The program successfully extracted ∼68% of the causal relationships identified by both judges (the intersection of the two sets of causal relationships identified by the judges). Of the instances that the computer program identified as causal relationships, ∼25% were identified by both judges, and 64% were identified by at least one of the judges. Problems encountered are discussed

146 citations

Proceedings ArticleDOI
11 Aug 2002
TL;DR: The experimental results show that the proposed method outperforms a direct application of a statistical learner often used for subject classification and it is conjecture that this dual feature set approach can be generalized to improve the performance of subject classification as well.
Abstract: Subject or prepositional content has been the focus of most classification research. Genre or style, on the other hand, is a different and important property of text, and automatic text genre classification is becoming important for classification and retrieval purposes as well as for some natural language processing research. In this paper, we present a method for automatic genre classification that is based on statistically selected features obtained from both subject-classified and genre classified training data. The experimental results show that the proposed method outperforms a direct application of a statistical learner often used for subject classification. We also observe that the deviation formula and discrimination formula using document frequency ratios also work as expected. We conjecture that this dual feature set approach can be generalized to improve the performance of subject classification as well.

136 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Abstract: The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

7,539 citations

Book
08 Jul 2008
TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

7,452 citations

01 Jan 1964
TL;DR: In this paper, the notion of a collective unconscious was introduced as a theory of remembering in social psychology, and a study of remembering as a study in Social Psychology was carried out.
Abstract: Part I. Experimental Studies: 2. Experiment in psychology 3. Experiments on perceiving III Experiments on imaging 4-8. Experiments on remembering: (a) The method of description (b) The method of repeated reproduction (c) The method of picture writing (d) The method of serial reproduction (e) The method of serial reproduction picture material 9. Perceiving, recognizing, remembering 10. A theory of remembering 11. Images and their functions 12. Meaning Part II. Remembering as a Study in Social Psychology: 13. Social psychology 14. Social psychology and the matter of recall 15. Social psychology and the manner of recall 16. Conventionalism 17. The notion of a collective unconscious 18. The basis of social recall 19. A summary and some conclusions.

5,690 citations