Syntactic dependency-based n-grams as classification features

doi:10.1007/978-3-642-37798-3_1

Book ChapterDOI

Syntactic dependency-based n-grams as classification features

Grigori Sidorov, +4 more

- pp 1-11

Chats0

TLDR

It is described how sn-grams were applied to authorship attribution, and how SVM classifier for several profile sizes was used, which resulted in better results.

Abstract:

In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency trees some simple additional steps should be made. Sn-grams can be applied in any NLP task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. SVM classifier for several profile sizes was used. We used as baseline traditional n-grams of words, POS tags and characters. Obtained results are better when applying sn-grams.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Synthesis Lectures on Human Language Technologies

Philip Williams, +3 more

Journal ArticleDOI

Text Classification Algorithms: A Survey

Kamran Kowsari, +5 more

- 17 Apr 2019 -

arXiv: Learning

TL;DR: An overview of text classification algorithms is discussed, which covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods.

...read moreread less

Journal ArticleDOI

Syntactic N-grams as machine learning features for natural language processing

Grigori Sidorov, +4 more

- 01 Feb 2014 -

Expert Systems With Applications

TL;DR: Sn-grams can be applied in any natural language processing (NLP) task where traditional n- grams are used and described how sn-rams were applied to authorship attribution.

...read moreread less

Journal ArticleDOI

Towards an intelligent framework for multimodal affective data analysis

Soujanya Poria, +3 more

- 01 Mar 2015 -

Neural Networks

TL;DR: A novel multimodal information extraction agent is proposed, which infers and aggregates the semantic and affective information associated with user-generated multi-modal data in contexts such as e-learning, e-health, automatic video content tagging and human-computer interaction.

...read moreread less

Patent

Natural language processing system and method

Michal Wroczynski, +4 more

TL;DR: In this paper, a natural language processing system is described, which includes a language decoder that generates information which is stored in a three-level framework (word, clause, phrase).

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Machine learning in automated text categorization

Fabrizio Sebastiani

- 01 Mar 2002 -

ACM Computing Surveys

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

...read moreread less

Journal IssueDOI

A survey of modern authorship attribution methods

Efstathios Stamatatos

- 01 Mar 2009 -

Journal of the Association for Informati...

TL;DR: A survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification.

...read moreread less

Book

Authorship Attribution

Patrick Juola

TL;DR: This review shows that the authorship attribution discipline is quite successful, even in difficult cases involving small documents in unfamiliar and less studied languages; it further analyzes the types of analysis and features used and tries to determine characteristics of well-performing systems, finally formulating these in a set of recommendations for best practices.

...read moreread less

Journal ArticleDOI

The Evolution of Stylometry in Humanities Scholarship

David I. Holmes

- 01 Sep 1998 -

Literary and Linguistic Computing

TL;DR: The authors traces the historical development of the use of statistical methods in the analysis of literary style, starting with stylometry's early origins, and looks at both successful and unsuccessful applications, and at the internal struggles as statisticians search for a proven methodology.

...read moreread less

Journal ArticleDOI

Applying authorship analysis to extremist-group Web forum messages

Ahmed Abbasi, +1 more

- 01 Sep 2005 -

IEEE Intelligent Systems

TL;DR: A special multilingual model is developed - the set of algorithms and related features - to identify Arabic messages, gearing this model toward the language's unique characteristics and incorporated a complex message extraction component to allow the use of a more comprehensive set of features tailored specifically toward online messages.

...read moreread less