scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Classifying User Personality Based on Media Social Posts Using Support Vector Machine Algorithm Based on DISC Approach

TL;DR: In this paper, the authors used Support Vector Machine (SVM) with Term Frequency-Inverse Document Frequency (TF-IDF) weighting on the dataset of 109 Twitter accounts to determine each Twitter user character.
Abstract: Twitter is one of the largest social media with 326 million active users in January 2019. Indonesia emerged as one of the largest countries in terms of Twitter users. Every day more than millions of tweets are published by Twitter users. This study tries to analyze Tweets to get the personalities from chosen Twitter accounts by using the DISC character approach. The classification algorithm that will be used is Support Vector Machine (SVM) with Term Frequency-Inverse Document Frequency (TF-IDF) weighting on the dataset. This research starts with preprocessing stages such as Data Cleansing and Case Folding. We involved psychologists to validate the personality approach of 109 Twitter accounts to determine each Twitter user character. The character classification results used in this study are Dominance, Influence, Steadiness, Compliance (DISC). From 109 Twitter accounts, we considered as the final dataset, we obtain an accuracy of 36.37%, average precision of 23.11%, and average recall performance of 35.25%.
References
More filters
Journal ArticleDOI
22 Dec 2017-Sensors
TL;DR: This study examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data and found that SVM produced the highest OA with the least sensitivity to the training sample sizes.
Abstract: In previous classification studies, three non-parametric classifiers, Random Forest (RF), k-Nearest Neighbor (kNN), and Support Vector Machine (SVM), were reported as the foremost classifiers at producing high accuracies. However, only a few studies have compared the performances of these classifiers with different training sample sizes for the same remote sensing images, particularly the Sentinel-2 Multispectral Imager (MSI). In this study, we examined and compared the performances of the RF, kNN, and SVM classifiers for land use/cover classification using Sentinel-2 image data. An area of 30 × 30 km2 within the Red River Delta of Vietnam with six land use/cover types was classified using 14 different training sample sizes, including balanced and imbalanced, from 50 to over 1250 pixels/class. All classification results showed a high overall accuracy (OA) ranging from 90% to 95%. Among the three classifiers and 14 sub-datasets, SVM produced the highest OA with the least sensitivity to the training sample sizes, followed consecutively by RF and kNN. In relation to the sample size, all three classifiers showed a similar and high OA (over 93.85%) when the training sample size was large enough, i.e., greater than 750 pixels/class or representing an area of approximately 0.25% of the total study area. The high accuracy was achieved with both imbalanced and balanced datasets.

777 citations

Journal ArticleDOI
TL;DR: This work begins to fill the gap in the bioethics literature to guide investigators and institutional review boards faced with navigating the ethical issues such use of social media raises by first defending a nonexceptionalist methodology for assessing social media recruitment; second, examining respect for privacy and investigator transparency as key norms governing social media recruited; and analyzing three relatively novel aspects of socialMedia recruitment.
Abstract: The use of social media as a recruitment tool for research with humans is increasing, and likely to continue to grow. Despite this, to date there has been no specific regulatory guidance and there ...

303 citations

Journal ArticleDOI
TL;DR: This paper first categorize the documents using KNN based machine learning approach and then return the most relevant documents to solve the text categorization problem.
Abstract: Text Categorization (TC), also known as Text Classification, is the task of automatically classifying a set of text documents into different categories from a predefined set. If a document belongs to exactly one of the categories, it is a single-label classification task; otherwise, it is a multi-label classification task. TC uses several tools from Information Retrieval (IR) and Machine Learning (ML) and has received much attention in the last years from both researchers in the academia and industry developers. In this paper, we first categorize the documents using KNN based machine learning approach and then return the most relevant documents.

197 citations

Journal ArticleDOI
TL;DR: An online SVM model to predictAir pollutant levels in an advancing time-series based on the monitored air pollutant database in Hong Kong downtown area is developed and the experimental comparison between the online and conventional SVM models demonstrates the effectiveness and efficiency in predicting air quality parameters with different time series.

154 citations

Journal ArticleDOI
TL;DR: Comparison of classification accuracies under a nested cross-validation evaluation shows that with an exception all four models perform similarly on the evaluated datasets, but the four classifiers command different amounts of computational resources for both testing and training.

153 citations