scispace - formally typeset
Search or ask a question
Topic

Categorization

About: Categorization is a research topic. Over the lifetime, 12919 publications have been published within this topic receiving 500009 citations. The topic is also known as: categorisation.


Papers
More filters
Book ChapterDOI
21 Apr 1998
TL;DR: This paper explores the use of Support Vector Machines for learning text classifiers from examples and analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task.
Abstract: This paper explores the use of Support Vector Machines (SVMs) for learning text classifiers from examples. It analyzes the particular properties of learning with text data and identifies why SVMs are appropriate for this task. Empirical results support the theoretical findings. SVMs achieve substantial improvements over the currently best performing methods and behave robustly over a variety of different learning tasks. Furthermore they are fully automatic, eliminating the need for manual parameter tuning.

8,658 citations

Journal ArticleDOI
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Abstract: The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

7,539 citations

01 Jan 2002
TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we flnd that standard machine learning techniques deflnitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classiflcation, and support vector machines) do not perform as well on sentiment classiflcation as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classiflcation problem more challenging.

6,980 citations

Proceedings ArticleDOI
06 Jul 2002
TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

6,626 citations

Book
08 Sep 2020
TL;DR: A review of the comparative database from across the behavioral sciences suggests both that there is substantial variability in experimental results across populations and that WEIRD subjects are particularly unusual compared with the rest of the species – frequent outliers.
Abstract: Behavioral scientists routinely publish broad claims about human psychology and behavior in the world's top journals based on samples drawn entirely from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies. Researchers - often implicitly - assume that either there is little variation across human populations, or that these "standard subjects" are as representative of the species as any other population. Are these assumptions justified? Here, our review of the comparative database from across the behavioral sciences suggests both that there is substantial variability in experimental results across populations and that WEIRD subjects are particularly unusual compared with the rest of the species - frequent outliers. The domains reviewed include visual perception, fairness, cooperation, spatial reasoning, categorization and inferential induction, moral reasoning, reasoning styles, self-concepts and related motivations, and the heritability of IQ. The findings suggest that members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans. Many of these findings involve domains that are associated with fundamental aspects of psychology, motivation, and behavior - hence, there are no obvious a priori grounds for claiming that a particular behavioral phenomenon is universal based on sampling from a single subpopulation. Overall, these empirical patterns suggests that we need to be less cavalier in addressing questions of human nature on the basis of data drawn from this particularly thin, and rather unusual, slice of humanity. We close by proposing ways to structurally re-organize the behavioral sciences to best tackle these challenges.

6,370 citations


Network Information
Related Topics (5)
Cognition
99.9K papers, 4.3M citations
87% related
Social relation
29.1K papers, 1.7M citations
83% related
Working memory
26.5K papers, 1.6M citations
81% related
Social network
42.9K papers, 1.5M citations
81% related
Inference
36.8K papers, 1.3M citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20232,260
20224,263
2021526
2020544
2019565
2018555