scispace - formally typeset
Search or ask a question

Identification and Classification of Player Types in Massive Multiplayer Online Games Using Avatar Behavior

01 Aug 2011-
TL;DR: An improved methodology for classifying players (identifying deviant players such as terrorists) through multivariate analysis of data from avatar characteristics and behaviors in massive multiplayer online games (MMOGs) is developed.
Abstract: : The purpose of our research is to develop an improved methodology for classifying players (identifying deviant players such as terrorists) through multivariate analysis of data from avatar characteristics and behaviors in massive multiplayer online games (MMOGs). To build our classification models, we developed three significant enhancements to the standard Generalized Regression Neural Networks (GRNN) modeling method. The first enhancement is a feature selection technique based on GRNNs, allowing us to tailor our feature set to be best modeled by GRNNs. The second enhancement is a hybrid GRNN which allows each feature to be modeled by a GRNN tailored to its data type. The third enhancement is a spread estimation technique for large data sets that is faster than exhaustive searches, yet more accurate than a standard heuristic. We applied our new techniques to a set of data from the MMOG, Everquest II, to identify deviant players ('gold farmers'). The identification of gold farmers is similar to labeling terrorists in that the ratio of gold farmer to standard player is extremely small, and the in-game behaviors for a gold farmer have detectable differences from a standard player. Our results were promising given the difficulty of the classification process, primarily the extremely unbalanced data set with a small number of observations from the class of interest. As a screening tool our method identifies a significantly reduced set of avatars and associated players with a much improved probability of containing a number of players displaying deviant behaviors. With further efforts at improving computing efficiencies to allow inclusion of additional features and observations with our framework, we expect even better results.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Three classification algorithms, multi-layer perceptron (MLP), radial basis function (RBF) and probabilistic neural networks (PNN), are applied for the purpose of detection and classification of breast cancer and PNN was the best classifiers by achieving accuracy rates of 100 and 97.66 % in both training and testing phases, respectively.
Abstract: Among cancers, breast cancer causes second most number of deaths in women. To reduce the high number of unnecessary breast biopsies, several computer-aided diagnosis systems have been proposed in the last years. These systems help physicians in their decision to perform a breast biopsy on a suspicious lesion seen in a mammogram or to perform a short-term follow-up examination instead. In clinical diagnosis, the use of artificial intelligent techniques as neural networks has shown great potential in this field. In this paper, three classification algorithms, multi-layer perceptron (MLP), radial basis function (RBF) and probabilistic neural networks (PNN), are applied for the purpose of detection and classification of breast cancer. Decision making is performed in two stages: training the classifiers with features from Wisconsin Breast Cancer database and then testing. The performance of the proposed structure is evaluated in terms of sensitivity, specificity, accuracy and ROC. The results revealed that PNN was the best classifiers by achieving accuracy rates of 100 and 97.66 % in both training and testing phases, respectively. MLP was ranked as the second classifier and was capable of achieving 97.80 and 96.34 % classification accuracy for training and validation phases, respectively, using scaled conjugate gradient learning algorithm. However, RBF performed better than MLP in the training phase, and it has achieved the lowest accuracy in the validation phase.

104 citations


Cites background from "Identification and Classification o..."

  • ...5 Example of radial basis function neural networks [11] 1742 Neural Comput & Applic (2013) 23:1737–1751...

    [...]

  • ...RBFs are similar to MLPs in that the neurons have weights, but they have fewer weights to train and each neuron is assigned a distribution....

    [...]

  • ...GRNNs are similar to RBFs in that each neuron is assigned a distribution, but there are no weights to train, making them relatively fast compared with RBFs and MLPs [11]....

    [...]

  • ...MLPs are very common, but require a large amount of time to train and assign weights to the neurons....

    [...]

Journal ArticleDOI
TL;DR: The forms, locations, methods of analyzing and exploiting Big Data, and current research on Big Data are examined, which concerns a myriad of tangential issues, from privacy to analysis methods that will be overviewed.
Abstract: “Big Data” is an emerging term used with business, engineering, and other domains. Although Big Data is a popular term used today, it is not a new concept. However, the means in which data can be collected is more readily available than ever, which makes Big Data more relevant than ever because it can be used to improve decisions and insights within the domains it is used. The term Big Data can be loosely defined as data that is too large for traditional analysis methods and techniques. In this article, varieties of prominent but loose definitions for Big Data are shared. In addition, a comprehensive overview of issues related to Big Data is summarized. For example, this paper examines the forms, locations, methods of analyzing and exploiting Big Data, and current research on Big Data. Big Data also concerns a myriad of tangential issues, from privacy to analysis methods that will also be overviewed. Best practices will further be considered. Additionally, the epistemology of Big Data and its history will be examined, as well as technical and societal problems existing with Big Data.

32 citations


Cites background from "Identification and Classification o..."

  • ..., 2008) Social Networks (Reips & Garaizar, 2011) Sports and Games (Bednar, 2011)...

    [...]

References
More filters
Proceedings Article
Ron Kohavi1
20 Aug 1995
TL;DR: The results indicate that for real-word datasets similar to the authors', the best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds.
Abstract: We review accuracy estimation methods and compare the two most common methods crossvalidation and bootstrap. Recent experimental results on artificial data and theoretical re cults in restricted settings have shown that for selecting a good classifier from a set of classifiers (model selection), ten-fold cross-validation may be better than the more expensive leaveone-out cross-validation. We report on a largescale experiment--over half a million runs of C4.5 and a Naive-Bayes algorithm--to estimate the effects of different parameters on these algrithms on real-world datasets. For crossvalidation we vary the number of folds and whether the folds are stratified or not, for bootstrap, we vary the number of bootstrap samples. Our results indicate that for real-word datasets similar to ours, The best method to use for model selection is ten fold stratified cross validation even if computation power allows using more folds.

11,185 citations

Journal ArticleDOI
TL;DR: The general regression neural network (GRNN) is a one-pass learning algorithm with a highly parallel structure that provides smooth transitions from one observed value to another.
Abstract: A memory-based network that provides estimates of continuous variables and converges to the underlying (linear or nonlinear) regression surface is described. The general regression neural network (GRNN) is a one-pass learning algorithm with a highly parallel structure. It is shown that, even with sparse data in a multidimensional measurement space, the algorithm provides smooth transitions from one observed value to another. The algorithmic form can be used for any regression problem in which an assumption of linearity is not justified. >

4,091 citations

Journal ArticleDOI
TL;DR: This chapter discusses the development of the Spatial Point Pattern Analysis Code in S–PLUS, which was developed in 1993 by P. J. Diggle and D. C. Griffith.
Abstract: (2005). Combining Pattern Classifiers: Methods and Algorithms. Technometrics: Vol. 47, No. 4, pp. 517-518.

3,933 citations

Journal ArticleDOI
TL;DR: A probabilistic neural network that can compute nonlinear decision boundaries which approach the Bayes optimal is formed, and a fourlayer neural network of the type proposed can map any input pattern to any number of classifications.

3,772 citations

Proceedings Article
29 Jun 2000
TL;DR: A new algorithm is introduced that eeciently, searches the space of cluster locations and number of clusters to optimize the Bayesian Information Criterion (BIC) or the Akaike Information Criteria (AIC) measure.
Abstract: Despite its popularity for general clustering, K-means suuers three major shortcomings; it scales poorly computationally, the number of clusters K has to be supplied by the user, and the search is prone to local minima. We propose solutions for the rst two problems, and a partial remedy for the third. Building on prior work for algorithmic acceleration that is not based on approximation, we introduce a new algorithm that eeciently, searches the space of cluster locations and number of clusters to optimize the Bayesian Information Criterion (BIC) or the Akaike Information Criterion (AIC) measure. The innovations include two new ways of exploiting cached suucient statistics and a new very eecient test that in one K-means sweep selects the most promising subset of classes for reenement. This gives rise to a fast, statistically founded algorithm that outputs both the number of classes and their parameters. Experiments show this technique reveals the true number of classes in the underlying distribution , and that it is much faster than repeatedly using accelerated K-means for different values of K.

2,466 citations