scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Automatic text categorization of news articles

28 Apr 2004-pp 224-226
TL;DR: In this study, a system is developed for automatic text categorization of news articles that is successful in separating articles into 5 different classes and 76% success ratio is achieved.
Abstract: To categorize the data reduces the access time. Nowadays, the Internet is one of the biggest data resources. However, most of the data on the Internet is written in natural language. To use the Internet more efficiently, it needs to be categorized. The amount of data and increment rate is so high that this process can not be done by hand. Hence, the necessity of automatic text categorization systems is increasing. In contrast to other languages, there is not much study on Turkish texts. In this study, a system is developed for automatic text categorization of news articles. The articles are classified into 5 different classes and 76% success ratio is achieved.
Citations
More filters
Proceedings ArticleDOI
01 Oct 2018
TL;DR: Results show that the Naive Bayes probability model can be used as an effective classifier method in classifying Turkish texts compared to other methods.
Abstract: The problem of text classification is the process of supervised assignment of text documents to one or more predefined categories or classes according to the content of the processed texts with natural language processing methods. Text classification applications are actively used in various fields such as categorization of social interactions, web pages and news texts, optimization of search engines, extracting information, and automatically processing e-mails. In this context, it is aimed to classify Turkish texts with methods based on supervised machine learning. In this context, the classification success of supervised learning models on Turkish texts was analyzed with different parameters. These models have been tested for classification of news texts on five predefined classes (economy, politics, sport, health, and technology) and the system was trained with different number of training documents and the classification process was carried out. In this context, the classification performances of Multinomial Naive Bayes, Bernoulli Naive Bayes, Support Vector Machine, K-Nearest Neighbor, and Decision Trees algorithms on Turkish news texts are compared and interpreted in the light of the results obtained with different parameters. As a result of the study, the procedure with the best classification success was the Multinomial Naive Bayes algorithm with a classification success of about 90%. These results show that the Naive Bayes probability model can be used as an effective classifier method in classifying Turkish texts compared to other methods. In this context, it is envisaged that the proposed methodology could be applied to Turkish texts on different web platforms (social networks, forums, communication networks, etc.) for different purposes.

19 citations


Cites background from "Automatic text categorization of ne..."

  • ...There are also classification-based studies on Turkish texts in limited numbers, and supervised machine learning approaches have been used in most of them [10], [15]–[17]....

    [...]

Proceedings ArticleDOI
Ömer Köksal1
01 Aug 2020
TL;DR: This paper proposes a methodology and expresses key-points for tuning the Turkish text classification process using supervised machine learning algorithms and shows that the methodology improves categorization results based on F1-score.
Abstract: Text classification is the process of determining categories or tags of a document depending on its content. Although it is a well-known process, it has many steps that require tuning to have better mathematical models. In this context, as an agglutinative language, especially the Turkish text classification process requires some extra tuning and preprocessing steps. This paper proposes a methodology and expresses key-points for tuning the Turkish text classification process using supervised machine learning algorithms. For this purpose, we perform intensive experiments on an open Turkish news dataset. Our study shows that our methodology improves categorization results based on F1-score.

12 citations

Journal ArticleDOI
TL;DR: This study focuses on categorization of documents in agglutinative languages and uses standard word form based language models as well as other modified language models based on root words, root words and part-of-speech information, truncated word forms and character sequences to find an optimum parameter set.
Abstract: In this paper, we investigate the document categorization task with statistical language models. Our study mainly focuses on categorization of documents in agglutinative languages. Due to the productive morphology of agglutinative languages, the number of word forms encountered in naturally occurring text is very large. From the language modeling perspective, a large vocabulary results in serious data sparseness problems. In order to cope with this drawback, previous studies in various application areas suggest modified language models based on different morphological units. It is reported that performance improvements can be achieved with these modified language models. In our document categorization experiments, we use standard word form based language models as well as other modified language models based on root words, root words and part-of-speech information, truncated word forms and character sequences. Additionally, to find an optimum parameter set, multiple tests are carried out with different la...

12 citations

Proceedings Article
01 Nov 2006
TL;DR: This paper focuses on the optimization of the weighting parameters, which are functions of word frequency, and shows that the new weighted kernel achieves better classification accuracy.
Abstract: Traditional bag-of-words model and recent word-sequence kernel are two well-known techniques in the field of text categorization. Bag-of-words representation neglects the word order, which could result in less computation accuracy for some types of documents. Word-sequence kernel takes into account word order, but does not include all information of the word frequency. A weighted kernel model that combines these two models was proposed by the authors [1]. This paper is focused on the optimization of the weighting parameters, which are functions of word frequency. Experiments have been conducted with Reuter's database and show that the new weighted kernel achieves better classification accuracy.

9 citations


Cites methods from "Automatic text categorization of ne..."

  • ...Experiments have been conducted with Reuter’s database and show that the new weighted kernel achieves better classification accuracy....

    [...]

Posted Content
TL;DR: This study aims to classify poetry according to poet using data set consisting of three different poetry of poets written in English, and five different classification algorithms are tried.
Abstract: With the widespread use of the internet, the size of the text data increases day by day. Poems can be given as an example of the growing text. In this study, we aim to classify poetry according to poet. Firstly, data set consisting of three different poetry of poets written in English have been constructed. Then, text categorization techniques are implemented on it. Chi-Square technique are used for feature selection. In addition, five different classification algorithms are tried. These algorithms are Sequential minimal optimization, Naive Bayes, C4.5 decision tree, Random Forest and k-nearest neighbors. Although each classifier showed very different results, over the 70% classification success rate was taken by sequential minimal optimization technique.

5 citations

References
More filters
Book
01 Jan 1984
TL;DR: The purpose and nature of Biological Memory, as well as some of the aspects of Memory Aspects, are explained.
Abstract: 1. Various Aspects of Memory.- 1.1 On the Purpose and Nature of Biological Memory.- 1.1.1 Some Fundamental Concepts.- 1.1.2 The Classical Laws of Association.- 1.1.3 On Different Levels of Modelling.- 1.2 Questions Concerning the Fundamental Mechanisms of Memory.- 1.2.1 Where Do the Signals Relating to Memory Act Upon?.- 1.2.2 What Kind of Encoding is Used for Neural Signals?.- 1.2.3 What are the Variable Memory Elements?.- 1.2.4 How are Neural Signals Addressed in Memory?.- 1.3 Elementary Operations Implemented by Associative Memory.- 1.3.1 Associative Recall.- 1.3.2 Production of Sequences from the Associative Memory.- 1.3.3 On the Meaning of Background and Context.- 1.4 More Abstract Aspects of Memory.- 1.4.1 The Problem of Infinite-State Memory.- 1.4.2 Invariant Representations.- 1.4.3 Symbolic Representations.- 1.4.4 Virtual Images.- 1.4.5 The Logic of Stored Knowledge.- 2. Pattern Mathematics.- 2.1 Mathematical Notations and Methods.- 2.1.1 Vector Space Concepts.- 2.1.2 Matrix Notations.- 2.1.3 Further Properties of Matrices.- 2.1.4 Matrix Equations.- 2.1.5 Projection Operators.- 2.1.6 On Matrix Differential Calculus.- 2.2 Distance Measures for Patterns.- 2.2.1 Measures of Similarity and Distance in Vector Spaces.- 2.2.2 Measures of Similarity and Distance Between Symbol Strings.- 2.2.3 More Accurate Distance Measures for Text.- 3. Classical Learning Systems.- 3.1 The Adaptive Linear Element (Adaline).- 3.1.1 Description of Adaptation by the Stochastic Approximation.- 3.2 The Perceptron.- 3.3 The Learning Matrix.- 3.4 Physical Realization of Adaptive Weights.- 3.4.1 Perceptron and Adaline.- 3.4.2 Classical Conditioning.- 3.4.3 Conjunction Learning Switches.- 3.4.4 Digital Representation of Adaptive Circuits.- 3.4.5 Biological Components.- 4. A New Approach to Adaptive Filters.- 4.1 Survey of Some Necessary Functions.- 4.2 On the "Transfer Function" of the Neuron.- 4.3 Models for Basic Adaptive Units.- 4.3.1 On the Linearization of the Basic Unit.- 4.3.2 Various Cases of Adaptation Laws.- 4.3.3 Two Limit Theorems.- 4.3.4 The Novelty Detector.- 4.4 Adaptive Feedback Networks.- 4.4.1 The Autocorrelation Matrix Memory.- 4.4.2 The Novelty Filter.- 5. Self-Organizing Feature Maps.- 5.1 On the Feature Maps of the Brain.- 5.2 Formation of Localized Responses by Lateral Feedback.- 5.3 Computational Simplification of the Process.- 5.3.1 Definition of the Topology-Preserving Mapping.- 5.3.2 A Simple Two-Dimensional Self-Organizing System.- 5.4 Demonstrations of Simple Topology-Preserving Mappings.- 5.4.1 Images of Various Distributions of Input Vectors.- 5.4.2 "The Magic TV".- 5.4.3 Mapping by a Feeler Mechanism.- 5.5 Tonotopic Map.- 5.6 Formation of Hierarchical Representations.- 5.6.1 Taxonomy Example.- 5.6.2 Phoneme Map.- 5.7 Mathematical Treatment of Self-Organization.- 5.7.1 Ordering of Weights.- 5.7.2 Convergence Phase.- 5.8 Automatic Selection of Feature Dimensions.- 6. Optimal Associative Mappings.- 6.1 Transfer Function of an Associative Network.- 6.2 Autoassociative Recall as an Orthogonal Projection.- 6.2.1 Orthogonal Projections.- 6.2.2 Error-Correcting Properties of Projections.- 6.3 The Novelty Filter.- 6.3.1 Two Examples of Novelty Filter.- 6.3.2 Novelty Filter as an Autoassociative Memory.- 6.4 Autoassociative Encoding.- 6.4.1 An Example of Autoassociative Encoding.- 6.5 Optimal Associative Mappings.- 6.5.1 The Optimal Linear Associative Mapping.- 6.5.2 Optimal Nonlinear Associative Mappings.- 6.6 Relationship Between Associative Mapping, Linear Regression, and Linear Estimation.- 6.6.1 Relationship of the Associative Mapping to Linear Regression.- 6.6.2 Relationship of the Regression Solution to the Linear Estimator.- 6.7 Recursive Computation of the Optimal Associative Mapping.- 6.7.1 Linear Corrective Algorithms.- 6.7.2 Best Exact Solution (Gradient Projection).- 6.7.3 Best Approximate Solution (Regression).- 6.7.4 Recursive Solution in the General Case.- 6.8 Special Cases.- 6.8.1 The Correlation Matrix Memory.- 6.8.2 Relationship Between Conditional Averages and Optimal Estimator.- 7. Pattern Recognition.- 7.1 Discriminant Functions.- 7.2 Statistical Formulation of Pattern Classification.- 7.3 Comparison Methods.- 7.4 The Subspace Methods of Classification.- 7.4.1 The Basic Subspace Method.- 7.4.2 The Learning Subspace Method (LSM).- 7.5 Learning Vector Quantization.- 7.6 Feature Extraction.- 7.7 Clustering.- 7.7.1 Simple Clustering (Optimization Approach).- 7.7.2 Hierarchical Clustering (Taxonomy Approach).- 7.8 Structural Pattern Recognition Methods.- 8. More About Biological Memory.- 8.1 Physiological Foundations of Memory.- 8.1.1 On the Mechanisms of Memory in Biological Systems.- 8.1.2 Structural Features of Some Neural Networks.- 8.1.3 Functional Features of Neurons.- 8.1.4 Modelling of the Synaptic Plasticity.- 8.1.5 Can the Memory Capacity Ensue from Synaptic Changes?.- 8.2 The Unified Cortical Memory Model.- 8.2.1 The Laminar Network Organization.- 8.2.2 On the Roles of Interneurons.- 8.2.3 Representation of Knowledge Over Memory Fields.- 8.2.4 Self-Controlled Operation of Memory.- 8.3 Collateral Reading.- 8.3.1 Physiological Results Relevant to Modelling.- 8.3.2 Related Modelling.- 9. Notes on Neural Computing.- 9.1 First Theoretical Views of Neural Networks.- 9.2 Motives for the Neural Computing Research.- 9.3 What Could the Purpose of the Neural Networks be?.- 9.4 Definitions of Artificial "Neural Computing" and General Notes on Neural Modelling.- 9.5 Are the Biological Neural Functions Localized or Distributed?.- 9.6 Is Nonlinearity Essential to Neural Computing?.- 9.7 Characteristic Differences Between Neural and Digital Computers.- 9.7.1 The Degree of Parallelism of the Neural Networks is Still Higher than that of any "Massively Parallel" Digital Computer.- 9.7.2 Why the Neural Signals Cannot be Approximated by Boolean Variables.- 9.7.3 The Neural Circuits do not Implement Finite Automata.- 9.7.4 Undue Views of the Logic Equivalence of the Brain and Computers on a High Level.- 9.8 "Connectionist Models".- 9.9 How can the Neural Computers be Programmed?.- 10. Optical Associative Memories.- 10.1 Nonholographic Methods.- 10.2 General Aspects of Holographic Memories.- 10.3 A Simple Principle of Holographic Associative Memory.- 10.4 Addressing in Holographic Memories.- 10.5 Recent Advances of Optical Associative Memories.- Bibliography on Pattern Recognition.- References.

8,197 citations