Clustering of Authors’ Texts of English Fiction in the Vector Space of Semantic Fields
Reads0
Chats0
TLDR
In this article, the authors' styles and idiolects in English fiction were analyzed in the vector space of semantic fields and in the semantic space with orthogonal basis.Abstract:
This paper describes the analysis of possible differentiation of the author's idiolect in the space of semantic fields; it also analyzes the clustering of text documents in the vector space of semantic fields and in the semantic space with orthogonal basis. The analysis showed that using the vector space model on the basis of semantic fields is efficient in cluster analysis algorithms of author's texts in English fiction. The study of the distribution of authors' texts in the cluster structure showed the presence of the areas of semantic space that represent the idiolects of individual authors. Such areas are described by the clusters where only one author dominates. The clusters, where the texts of several authors dominate, can be considered as areas of semantic similarity of author's styles. SVD factorization of the semantic fields matrix makes it possible to reduce significantly the dimension of the semantic space in the cluster analysis of author's texts. Using the clustering of the semantic field vector space can be efficient in a comparative analysis of author's styles and idiolects. The clusters of some authors' idiolects are semantically invariant and do not depend on any changes in the basis of the semantic space and clustering method.read more
Citations
More filters
Proceedings ArticleDOI
Structure-based Clustering of Novels
TL;DR: This work builds social networks from novels as a strategy to quantify their plot and structure and performs clustering over the vectors obtained, and the resulting groups are contrasted in terms of author and genre.
Journal ArticleDOI
Clustering of Novels Represented as Social Networks
TL;DR: This paper builds static and dynamic social networks of characters as a strategy to represent the narrative structure of novels in a quantifiable manner and performs clustering on the vectors and analyzes the resulting clusters in terms of genre and authorship.
Journal ArticleDOI
Methods of Informational Trends Analytics and Fake News Detection on Twitter
TL;DR: Information trends caused by Russian invasion of Ukraine in 2022 year have been studied and the possible impact of informational trends on different companies working in Russia during this invasion is considered.
Journal ArticleDOI
The Distribution of Semantic Fields in Author's Texts
TL;DR: The analysis of frequency distribution of semantic fields of nouns and verbs in the texts of English fiction using Shapiro-Wilk test showed that the author’s idiolect is represented in the vector space of semantic field.
References
More filters
Journal ArticleDOI
Indexing by Latent Semantic Analysis
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book
Introduction to Information Retrieval
TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Journal ArticleDOI
Machine learning in automated text categorization
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Journal ArticleDOI
From frequency to meaning: vector space models of semantics
Peter D. Turney,Patrick Pantel +1 more
TL;DR: The goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs, and to provide pointers into the literature for those who are less familiar with the field.
Proceedings ArticleDOI
Fast and effective text mining using linear-time document clustering
Bjornar Larsen,Chinatsu Aone +1 more
TL;DR: An unsupervised, near-linear time text clustering system that offers a number of algorithm choices for each phase, and a refinement to center adjustment, “vector average damping,” that further improves cluster quality.