scispace - formally typeset
Open AccessJournal ArticleDOI

Clustering of Authors’ Texts of English Fiction in the Vector Space of Semantic Fields

Bohdan M. Pavlyshenko
- 01 Sep 2014 - 
- Vol. 14, Iss: 3, pp 25-36
Reads0
Chats0
TLDR
In this article, the authors' styles and idiolects in English fiction were analyzed in the vector space of semantic fields and in the semantic space with orthogonal basis.
Abstract
This paper describes the analysis of possible differentiation of the author's idiolect in the space of semantic fields; it also analyzes the clustering of text documents in the vector space of semantic fields and in the semantic space with orthogonal basis. The analysis showed that using the vector space model on the basis of semantic fields is efficient in cluster analysis algorithms of author's texts in English fiction. The study of the distribution of authors' texts in the cluster structure showed the presence of the areas of semantic space that represent the idiolects of individual authors. Such areas are described by the clusters where only one author dominates. The clusters, where the texts of several authors dominate, can be considered as areas of semantic similarity of author's styles. SVD factorization of the semantic fields matrix makes it possible to reduce significantly the dimension of the semantic space in the cluster analysis of author's texts. Using the clustering of the semantic field vector space can be efficient in a comparative analysis of author's styles and idiolects. The clusters of some authors' idiolects are semantically invariant and do not depend on any changes in the basis of the semantic space and clustering method.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Structure-based Clustering of Novels

TL;DR: This work builds social networks from novels as a strategy to quantify their plot and structure and performs clustering over the vectors obtained, and the resulting groups are contrasted in terms of author and genre.
Journal ArticleDOI

Clustering of Novels Represented as Social Networks

TL;DR: This paper builds static and dynamic social networks of characters as a strategy to represent the narrative structure of novels in a quantifiable manner and performs clustering on the vectors and analyzes the resulting clusters in terms of genre and authorship.
Journal ArticleDOI

Methods of Informational Trends Analytics and Fake News Detection on Twitter

Bohdan M. Pavlyshenko
- 11 Apr 2022 - 
TL;DR: Information trends caused by Russian invasion of Ukraine in 2022 year have been studied and the possible impact of informational trends on different companies working in Russia during this invasion is considered.
Journal ArticleDOI

The Distribution of Semantic Fields in Author's Texts

TL;DR: The analysis of frequency distribution of semantic fields of nouns and verbs in the texts of English fiction using Shapiro-Wilk test showed that the author’s idiolect is represented in the vector space of semantic field.
References
More filters
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book

Introduction to Information Retrieval

TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Journal ArticleDOI

Machine learning in automated text categorization

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Journal ArticleDOI

From frequency to meaning: vector space models of semantics

TL;DR: The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.
Proceedings ArticleDOI

Fast and effective text mining using linear-time document clustering

TL;DR: An unsupervised, near-linear time text clustering system that offers a number of algorithm choices for each phase, and a refinement to center adjustment, “vector average damping,” that further improves cluster quality.
Related Papers (5)