A Computational Linguistic Approach for Gender Prediction Based on Vietnamese Names
TLDR
A model based on N-gram for the full name, combining its own middle name feature based on the specificity of Vietnamese language, is proposed, which achieves 90.9% of accuracy on gender prediction tasks.Abstract:
Gender prediction is extensively studied in recent years since it is widely applied in many fields. Several factors have been investigated to determine a gender of male or female through facial images, voice, gait, finger print, etc. In this study, we present a machine learning approach for gender determination based on Vietnamese names. A model based on N-gram for the full name, combining its own middle name feature based on the specificity of Vietnamese language, is proposed. The experimental evaluation of gender prediction tasks is applied on GenderVN1.0 dataset (with 3 million Vietnamese names) that achieves 90.9% of accuracy.read more
References
More filters
Journal ArticleDOI
Comparison and benchmark of name-to-gender inference services.
TL;DR: This work compares and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names and defines a parameter tuning procedure to search for optimal values of the services’ free parameters.
Journal ArticleDOI
A survey on classification techniques for opinion mining and sentiment analysis
TL;DR: This paper represents a complete, multilateral and systematic review of opinion mining and sentiment analysis to classify available methods and compare their advantages and drawbacks, in order to have better understanding of available challenges and solutions to clarify the future direction.
Journal ArticleDOI
Author gender identification from text
TL;DR: This paper investigates author gender identification for short length, multi-genre, content-free text, such as the ones found in many Internet applications, and proposes 545 psycho-linguistic and gender-preferential cues along with stylometric features to build the feature space for this identification problem.
Journal ArticleDOI
The Impact of Features Extraction on the Sentiment Analysis
TL;DR: By using TF-IDF word level (Term Frequency-Inverse Document Frequency) performance of sentiment analysis is 3-4% higher than using N-gram features, analysis is done using six classification algorithms(Decision Tree, Support vector Machine, K-Nearest Neighbour, Random Forest, Logistic Regression, Naive Bayes) and considering F-Score, Accuracy, Precision, and Recall performance parameters.
Journal ArticleDOI
Gender recognition: A multiscale decision fusion approach
TL;DR: This paper presents an approach to gender recognition based on shape, texture and plain intensity features gathered at different scales and proposes a new dataset for gender evaluation based on images from the UND database.