scispace - formally typeset
Open AccessJournal ArticleDOI

A Computational Linguistic Approach for Gender Prediction Based on Vietnamese Names

Thien Ho Huong, +2 more
- 27 Feb 2022 - 
- Vol. 2022, pp 1-6
TLDR
A model based on N-gram for the full name, combining its own middle name feature based on the specificity of Vietnamese language, is proposed, which achieves 90.9% of accuracy on gender prediction tasks.
Abstract
Gender prediction is extensively studied in recent years since it is widely applied in many fields. Several factors have been investigated to determine a gender of male or female through facial images, voice, gait, finger print, etc. In this study, we present a machine learning approach for gender determination based on Vietnamese names. A model based on N-gram for the full name, combining its own middle name feature based on the specificity of Vietnamese language, is proposed. The experimental evaluation of gender prediction tasks is applied on GenderVN1.0 dataset (with 3 million Vietnamese names) that achieves 90.9% of accuracy.

read more

References
More filters
Journal ArticleDOI

Comparison and benchmark of name-to-gender inference services.

TL;DR: This work compares and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names and defines a parameter tuning procedure to search for optimal values of the services’ free parameters.
Journal ArticleDOI

A survey on classification techniques for opinion mining and sentiment analysis

TL;DR: This paper represents a complete, multilateral and systematic review of opinion mining and sentiment analysis to classify available methods and compare their advantages and drawbacks, in order to have better understanding of available challenges and solutions to clarify the future direction.
Journal ArticleDOI

Author gender identification from text

TL;DR: This paper investigates author gender identification for short length, multi-genre, content-free text, such as the ones found in many Internet applications, and proposes 545 psycho-linguistic and gender-preferential cues along with stylometric features to build the feature space for this identification problem.
Journal ArticleDOI

The Impact of Features Extraction on the Sentiment Analysis

TL;DR: By using TF-IDF word level (Term Frequency-Inverse Document Frequency) performance of sentiment analysis is 3-4% higher than using N-gram features, analysis is done using six classification algorithms(Decision Tree, Support vector Machine, K-Nearest Neighbour, Random Forest, Logistic Regression, Naive Bayes) and considering F-Score, Accuracy, Precision, and Recall performance parameters.
Journal ArticleDOI

Gender recognition: A multiscale decision fusion approach

TL;DR: This paper presents an approach to gender recognition based on shape, texture and plain intensity features gathered at different scales and proposes a new dataset for gender evaluation based on images from the UND database.
Related Papers (5)