Book ChapterDOI
Syntactic dependency-based n-grams as classification features
Grigori Sidorov,Francisco Velasquez,Efstathios Stamatatos,Alexander Gelbukh,Liliana Chanona-Hernández +4 more
- pp 1-11
Reads0
Chats0
TLDR
It is described how sn-grams were applied to authorship attribution, and how SVM classifier for several profile sizes was used, which resulted in better results.Abstract:
In this paper we introduce a concept of syntactic n-grams (sn-grams). Sn-grams differ from traditional n-grams in the manner of what elements are considered neighbors. In case of sn-grams, the neighbors are taken by following syntactic relations in syntactic trees, and not by taking the words as they appear in the text. Dependency trees fit directly into this idea, while in case of constituency trees some simple additional steps should be made. Sn-grams can be applied in any NLP task where traditional n-grams are used. We describe how sn-grams were applied to authorship attribution. SVM classifier for several profile sizes was used. We used as baseline traditional n-grams of words, POS tags and characters. Obtained results are better when applying sn-grams.read more
Citations
More filters
Journal ArticleDOI
Synthesis Lectures on Human Language Technologies
Journal ArticleDOI
Text Classification Algorithms: A Survey
Kamran Kowsari,Kiana Jafari Meimandi,Mojtaba Heidarysafa,Sanjana Mendu,Laura E. Barnes,Donald E. Brown +5 more
TL;DR: An overview of text classification algorithms is discussed, which covers different text feature extractions, dimensionality reduction methods, existing algorithms and techniques, and evaluations methods.
Journal ArticleDOI
Syntactic N-grams as machine learning features for natural language processing
Grigori Sidorov,Francisco Velasquez,Efstathios Stamatatos,Alexander Gelbukh,Liliana Chanona-Hernández +4 more
TL;DR: Sn-grams can be applied in any natural language processing (NLP) task where traditional n- grams are used and described how sn-rams were applied to authorship attribution.
Journal ArticleDOI
Towards an intelligent framework for multimodal affective data analysis
TL;DR: A novel multimodal information extraction agent is proposed, which infers and aggregates the semantic and affective information associated with user-generated multi-modal data in contexts such as e-learning, e-health, automatic video content tagging and human-computer interaction.
Patent
Natural language processing system and method
TL;DR: In this paper, a natural language processing system is described, which includes a language decoder that generates information which is stored in a three-level framework (word, clause, phrase).
References
More filters
Journal ArticleDOI
Machine learning in automated text categorization
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Journal IssueDOI
A survey of modern authorship attribution methods
TL;DR: A survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification.
Book
Authorship Attribution
TL;DR: This review shows that the authorship attribution discipline is quite successful, even in difficult cases involving small documents in unfamiliar and less studied languages; it further analyzes the types of analysis and features used and tries to determine characteristics of well-performing systems, finally formulating these in a set of recommendations for best practices.
Journal ArticleDOI
The Evolution of Stylometry in Humanities Scholarship
TL;DR: The authors traces the historical development of the use of statistical methods in the analysis of literary style, starting with stylometry's early origins, and looks at both successful and unsuccessful applications, and at the internal struggles as statisticians search for a proven methodology.
Journal ArticleDOI
Applying authorship analysis to extremist-group Web forum messages
Ahmed Abbasi,Hsinchun Chen +1 more
TL;DR: A special multilingual model is developed - the set of algorithms and related features - to identify Arabic messages, gearing this model toward the language's unique characteristics and incorporated a complex message extraction component to allow the use of a more comprehensive set of features tailored specifically toward online messages.