Clustering of Authors’ Texts of English Fiction in the Vector Space of Semantic Fields

doi:10.2478/CAIT-2014-0030

Open AccessJournal ArticleDOI

Clustering of Authors’ Texts of English Fiction in the Vector Space of Semantic Fields

Bohdan M. Pavlyshenko

- 01 Sep 2014 -

Cybernetics and Information Technologies

- Vol. 14, Iss: 3, pp 25-36

Chats0

TLDR

In this article, the authors' styles and idiolects in English fiction were analyzed in the vector space of semantic fields and in the semantic space with orthogonal basis.

Abstract:

This paper describes the analysis of possible differentiation of the author's idiolect in the space of semantic fields; it also analyzes the clustering of text documents in the vector space of semantic fields and in the semantic space with orthogonal basis. The analysis showed that using the vector space model on the basis of semantic fields is efficient in cluster analysis algorithms of author's texts in English fiction. The study of the distribution of authors' texts in the cluster structure showed the presence of the areas of semantic space that represent the idiolects of individual authors. Such areas are described by the clusters where only one author dominates. The clusters, where the texts of several authors dominate, can be considered as areas of semantic similarity of author's styles. SVD factorization of the semantic fields matrix makes it possible to reduce significantly the dimension of the semantic space in the cluster analysis of author's texts. Using the clustering of the semantic field vector space can be efficient in a comparative analysis of author's styles and idiolects. The clusters of some authors' idiolects are semantically invariant and do not depend on any changes in the basis of the semantic space and clustering method.

Clustering of Authors’ Texts of English Fiction in the Vector Space of Semantic Fields

Citations

Structure-based Clustering of Novels

Clustering of Novels Represented as Social Networks

Methods of Informational Trends Analytics and Fake News Detection on Twitter

The Distribution of Semantic Fields in Author's Texts

References

Indexing by Latent Semantic Analysis

Introduction to Information Retrieval

Machine learning in automated text categorization

From frequency to meaning: vector space models of semantics

Fast and effective text mining using linear-time document clustering

Related Papers (5)

The Clustering of Author's Texts of English Fiction in the Vector Space of Semantic Fields

Classification analysis of authorship fiction texts in the space of semantic fields

Semantic flow in language networks discriminates texts by genre and publication date

Measuring similarity of academic articles with semantic profile and joint word embedding

Measuring semantic similarity: representations and methods