scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Data Science in 2019"


Journal ArticleDOI
TL;DR: A new intelligent machine learning framework for predicting the results of games played at the NBA is proposed by aiming to discover the influential features set that affects the outcomes of NBA games by comparing the performance and the models derived against different features sets related to basketball games.
Abstract: In the recent years, sports outcome prediction has gained popularity, as demonstrated by massive financial transactions in sports betting. One of the world’s popular sports that lures betting and attracts millions of fans worldwide is basketball, particularly the National Basketball Association (NBA) of the United States. This paper proposes a new intelligent machine learning framework for predicting the results of games played at the NBA by aiming to discover the influential features set that affects the outcomes of NBA games. We would like to identify whether machine learning methods are applicable to forecasting the outcome of an NBA game using historical data (previous games played), and what are the significant factors that affect the outcome of games. To achieve the objectives, several machine learning methods that utilise different learning schemes to derive the models, including Naive Bayes, artificial neural network, and Decision Tree, are selected. By comparing the performance and the models derived against different features sets related to basketball games, we can discover the key features that contribute to better performance such as accuracy and efficiency of the prediction model. Based on the results analysis, the DRB (defensive rebounds) feature was chosen and was deemed as the most significant factor influencing the results of an NBA game. Furthermore, others crucial factors such as TPP (three-point percentage), FT (free throws made), and TRB (total rebounds) were also selected, which subsequently increased the model’s prediction accuracy rate by 2–4%.

64 citations


Journal ArticleDOI
TL;DR: In this article, a new three-parameter lifetime distribution named as the inverse power Lomax distribution was proposed, which is the inverse form of the power LOMAX distribution.
Abstract: We introduce and study a new three-parameter lifetime distribution named as the inverse power Lomax. The proposed distribution is obtained as the inverse form of the power Lomax distribution. Some statistical properties of the inverse power Lomax model are implemented. Based on censored samples, maximum likelihood estimators of the model parameters are obtained. An intensive simulation study is performed for evaluating the behavior of estimators based on their biases and mean square errors. Superiority of the new model over some well-known distributions is illustrated by means of real data sets. The results revealed the fact that; the suggested model can produce better fits than some well-known distributions.

44 citations


Journal ArticleDOI
TL;DR: In this paper, a new two-parameter distribution with decreasing failure rate is introduced, called Alpha Power Transformed Lindley (APTL), which provides better fits than the Lindley distribution and some of its known generalizations.
Abstract: The Lindley distribution has been generalized by many authors in recent years. A new two-parameter distribution with decreasing failure rate is introduced, called Alpha Power Transformed Lindley (APTL, in short, henceforth) distribution that provides better fits than the Lindley distribution and some of its known generalizations. The new model includes the Lindley distribution as a special case. Various properties of the proposed distribution, including explicit expressions for the ordinary moments, incomplete and conditional moments, mean residual lifetime, mean deviations, L-moments, moment generating function, cumulant generating function, characteristic function, Bonferroni and Lorenz curves, entropies, stress-strength reliability, stochastic ordering, statistics and distribution of sums, differences, ratios and products are derived. The new distribution can have decreasing increasing, and upside-down bathtub failure rates function depending on its parameters. The model parameters are obtained by the method of maximum likelihood estimation. Also, we obtain the confidence intervals of the model parameters. A simulation study is carried out to examine the bias and mean squared error of the maximum likelihood estimators of the parameters. Finally, two data sets have been analyzed to show how the proposed models work in practice.

43 citations


Journal ArticleDOI
TL;DR: It is found that correlated data when associated with important variables improve those common regularisation methods in all aspects, and that the level of sparsity can be reflected not only from the number of important variables but also from their overall effect size and locations.
Abstract: High dimensional data are rapidly growing in many domains due to the development of technological advances which helps collect data with a large number of variables to better understand a given phenomenon of interest. Particular examples appear in genomics, fMRI data analysis, large-scale healthcare analytics, text/image analysis and astronomy. In the last two decades regularisation approaches have become the methods of choice for analysing such high dimensional data. This paper aims to study the performance of regularisation methods, including the recently proposed method called de-biased lasso, for the analysis of high dimensional data under different sparse and non-sparse situations. Our investigation concerns prediction, parameter estimation and variable selection. We particularly study the effects of correlated variables, covariate location and effect size which have not been well investigated. We find that correlated data when associated with important variables improve those common regularisation methods in all aspects, and that the level of sparsity can be reflected not only from the number of important variables but also from their overall effect size and locations. The latter may be seen under a non-sparse data structure. We demonstrate that the de-biased lasso performs well especially in low dimensional data, however it still suffers from issues, such as multicollinearity and multiple hypothesis testing, similar to the classical regression methods.

33 citations


Journal ArticleDOI
TL;DR: In this paper, the inverse Gompertz distribution with two parameters is introduced and the model parameters are estimated by the method of maximum likelihood, bootstrap, least squares, weighted least squares and Cramer-von Mises.
Abstract: In this article, we introduce inverse Gompertz distribution with two parameters. Some statistical properties are presented such as hazard rate function, quantile, probability weighted (moments), skewness, kurtosis, entropies function, mean residual lifetime and mean inactive lifetime. The model parameters are estimated by the method of maximum likelihood, bootstrap, least squares, weighted least squares and Cramer-von Mises. Further, Monte Carlo simulations are carried out to compare the long-run performance of the estimators based on complete and type II right censored data. Finally, we estimate the parameters based on behavioral sciences data and fatigue life of 10 bearing of a certain type in hours censored data, which explain that the model fits the data better than some models.

30 citations


Journal ArticleDOI
TL;DR: In this article, a new family of probability distributions generated from a power Lindley random variable is introduced, which is called the Power Lindley-Generated Family (PLFG).
Abstract: In this paper, we introduce a new family of probability distributions generated from a power Lindley random variable called the power Lindley-generated family. The new family extends several classical distributions as well as generalizes the odd Lindley family which is performed by Silva et al. (Austrian J Stat 46:65–87, 2017). Some of the mathematical properties are obtained involving moments, incomplete moments, quantile function and order statistics. New four distributions are provided as special models from the family. The model parameters of the family are estimated by the maximum likelihood technique. An application to real data set and simulation study are provided to demonstrate the flexibility and interest of one special model of the suggested family.

26 citations


Journal ArticleDOI
TL;DR: This paper presents a random projection scheme for cancelable iris recognition that guarantees exclusion of eyelids and eyelashes effects, and masking of the original Gabor features to increase the level of security.
Abstract: This paper presents a random projection scheme for cancelable iris recognition. Instead of using original iris features, masked versions of the features are generated through the random projection in order to increase the security of the iris recognition system. The proposed framework for iris recognition includes iris localization, sector selection of the iris to avoid eyelids and eyelashes effects, normalization, segmentation of normalized iris region into halves, selection of the upper half for further reduction of eyelids and eyelashes effects, feature extraction with Gabor filter, and finally random projection. This framework guarantees exclusion of eyelids and eyelashes effects, and masking of the original Gabor features to increase the level of security. Matching is performed with a Hamming Distance (HD) metric. The proposed framework achieves promising recognition rates of 99.67% and a leading Equal Error Rate (EER) of 0.58%.

26 citations


Journal ArticleDOI
TL;DR: In this paper, a new class of bivariate distributions called the bivariate Gumbel-G family is proposed, whose marginal distributions are gumbel families, and a special model of the new family is discussed in detail.
Abstract: In this paper, a new class of bivariate distributions called the bivariate Gumbel-G family is proposed, whose marginal distributions are Gumbel-G families. Several of its statistical properties are derived. After introducing the general class, a special model of the new family is discussed in-detail. Bayesian and maximum likelihood techniques are used to estimate the model parameters. Simulation study is carried out to examine the bias and mean square error of Bayesian and maximum likelihood estimators. Finally, a real data set is analyzed for illustrative the flexibility of the proposed bivariate family.

24 citations


Journal ArticleDOI
TL;DR: In this paper, a new family of continuous distributions which ensure model flexiblity, based on the Frechet distribution and Topp Leone-G family, is introduced, and the maximum likelihood estimates and the observed information matrix are obtained for the model parameters.
Abstract: A new family of continuous distributions which ensure model flexiblity, is introduced based on the Frechet distribution and Topp Leone-G family. Two special sub-models of the new family are discussed. We provide some distributional properties of this family in the general setting such as the series expansions of density, moments, generating function, stress strength model, Renyi and Shannon entropies, probability weighted moments and order statistics. Certain characterizations of the proposed family are presented. The maximum likelihood estimates and the observed information matrix are obtained for the model parameters. We assess the performance of the maximum likelihood estimators by means of a graphical simulation study. The potentiality of the new class is shown via two applications to real data sets.

24 citations


Journal ArticleDOI
TL;DR: A comparative study of fundamental and technical analysis based on different parameters of the stock market prediction techniques using time series analysis and machine learning algorithms such as the artificial neural network.
Abstract: The stock market is a popular investment option for investors because of its expected high returns. Stock market prediction is a complex task to achieve with the help of artificial intelligence. Because stock prices depend on many factors, including trends and news in the market. However, in recent years, many creative techniques and models have been proposed and applied to efficiently and accurately forecast the behaviour of the stock market. This paper presents a comparative study of fundamental and technical analysis based on different parameters. We also discuss a comparative Analysis of various prediction techniques used to predict stock price. These strategies include technical analysis like time series analysis and machine learning algorithms such as the artificial neural network (ANN). Along with them, few researchers focused on the textual analysis of stock prices by continuous analysing the public sentiments from social media and other news sources. Various approaches are compared based on methodologies, datasets, and efficiency with the help of visualisation.

21 citations


Journal ArticleDOI
TL;DR: In this article, the statistical inference for the Gompertz distribution based on generalized progressively hybrid censored data is discussed, and the estimation of the parameters for GOMERTZ distribution is discussed using the maximum likelihood method and the Bayesian methods under different loss functions.
Abstract: In this paper, the statistical inference for the Gompertz distribution based on generalized progressively hybrid censored data is discussed. The estimation of the parameters for Gompertz distribution is discussed using the maximum likelihood method and the Bayesian methods under different loss functions. The existence and uniqueness of the maximum likelihood estimation are proved. The point and interval Bayesian predictions for unobserved failures from the same sample and that from the future sample are derived. The Monte Carlo simulation is applied to compare the proposed methods. A real data example is used to apply the methods of estimation and to construct the prediction intervals.

Journal ArticleDOI
TL;DR: An efficient feature selection algorithm based on random forest is presented to improve the performance of the MLAs without sacrificing the guarantees on the accuracy while processing the large and complex datasets.
Abstract: Machine learning algorithms (MLAs) usually process large and complex datasets containing a substantial number of features to extract meaningful information about the target concept (a.k.a class). In most cases, MLAs suffer from the latency and computational complexity issues while processing such complex datasets due to the presence of lesser weight (i.e., irrelevant or redundant) features. The computing time of the MLAs increases explosively with increase in the number of features, feature dependence, number of records, types of the features, and nested features categories present in such datasets. Appropriate feature selection before applying MLA is a handy solution to effectively resolve the computing speed and accuracy trade-off while processing large and complex datasets. However, selection of the features that are sufficient, necessary, and are highly co-related with the target concept is very challenging. This paper presents an efficient feature selection algorithm based on random forest to improve the performance of the MLAs without sacrificing the guarantees on the accuracy while processing the large and complex datasets. The proposed feature selection algorithm yields unique features that are closely related with the target concept (i.e., class). The proposed algorithm significantly reduces the computing time of the MLAs without degrading the accuracy much while learning the target concept from the large and complex datasets. The simulation results fortify the efficacy and effectiveness of the proposed algorithm.

Journal ArticleDOI
TL;DR: In this paper, a new generator of continuous distributions called Exponentiated Generalized Marshall-Olkin-G family with three additional parameters is proposed, which contains several known distributions as sub models.
Abstract: A new generator of continuous distributions called Exponentiated Generalized Marshall–Olkin-G family with three additional parameters is proposed. This family of distribution contains several known distributions as sub models. The probability density function and cumulative distribution function are expressed as infinite mixture of the Marshall–Olkin distribution. Important properties like quantile function, order statistics, moment generating function, probability weighted moments, entropy and shapes are investigated. The maximum likelihood method to estimate model parameters is presented. A simulation result to assess the performance of the maximum likelihood estimation is briefly discussed. A distribution from this family is compared with two sub models and some recently introduced lifetime models by considering three real life data fitting applications.

Journal ArticleDOI
TL;DR: A minimum redundancy and maximum variance based unsupervised band selection methodology is proposed and is compared with four other existing state-of-the-art methods in the similar field in terms of OA and execution time for evaluating the performance.
Abstract: Contiguous narrow bands of hyperspectral images greatly increase computational complexity. Redundancy reduction is therefore necessary. Here, a minimum redundancy and maximum variance based unsupervised band selection methodology is proposed. Discrete wavelet transformation is applied on the data to reduce spatial redundancy without much effecting the overall band correlations. This in turn made the process more time efficient and noise resilient. Highly correlated bands are considered similar, and one with higher variance is accepted as being more discriminating. Finally, classification is performed with the selected bands and overall accuracy (OA) is calculated. The proposed method is compared with four other existing state-of-the-art methods in the similar field in terms of OA and execution time for evaluating the performance.

Journal ArticleDOI
TL;DR: In this paper, a cubic transmuted Weibull (\\\\\\\\ CTW $$ ) distribution has been proposed by using the general family of transmuted distributions introduced by Rahman et al. The parameter estimation and inference procedure for the proposed distribution have been discussed.
Abstract: In this paper, a cubic transmuted Weibull ( $$ CTW $$ ) distribution has been proposed by using the general family of transmuted distributions introduced by Rahman et al. (Pak J Stat Oper Res 14:451–469, 2018). We have explored the proposed $$ CTW $$ distribution in details and have studied its statistical properties as well. The parameter estimation and inference procedure for the proposed distribution have been discussed. We have conducted a simulation study to observe the performance of estimation technique. Finally, we have considered two real-life data sets to investigate the practicality of proposed $$ CTW $$ distribution.

Journal ArticleDOI
TL;DR: In this paper, a new one-parameter lifetime distribution named as Burr-Hatke exponential (BHE) distribution is introduced and Monte Carlo simulations are performed to compare the performances of the obtained estimators in mean square error sense.
Abstract: In this paper, we introduce a new one-parameter lifetime distribution as an alternative to exponential distribution named as Burr–Hatke exponential (BHE) distribution. Classical and Bayesian estimation procedure for the estimation of BHE model parameter are discussed using on the Type-II hybrid censored data. The Monte Carlo simulations are performed to compare the performances of the obtained estimators in mean square error sense. Two real data sets are analyzed for the illustrative purpose of the considered study. Additionally, a new log-location regression model based on the new distribution is introduced and studied.

Journal ArticleDOI
TL;DR: In this paper, a new probability distribution, named inverse xgamma (IXG) distribution, was proposed and different mathematical and statistical properties, viz., reliability characteristics, inverse moments, quantile function, mean inverse residual life, stress-strength reliability, stochastic ordering and order statistics of the proposed distribution have been derived and discussed.
Abstract: The paper proposes a new probability distribution, named inverse xgamma (IXG) distribution. Different mathematical and statistical properties, viz., reliability characteristics, inverse moments, quantile function, mean inverse residual life, stress-strength reliability, stochastic ordering and order statistics of the proposed distribution have been derived and discussed. Estimation of the parameter of IXG distribution has been approached by different methods, namely, maximum likelihood estimation, least squares estimation, weighted least squares estimation, Cramer–von-Mises estimation and maximum product of spacing estimation (MPSE). A simulation study has been carried out to compare the performance of these estimators in terms of their mean squared errors. Asymptotic confidence interval of the parameter in terms of average widths and coverage probabilities is also obtained using MPSE of the parameter. Finally, a data set is used to demonstrate the applicability of IXG distribution in real life situations.

Journal ArticleDOI
TL;DR: In this paper, a new lifetime distribution based on the general odd hyperbolic cosine-FG model is introduced, which is shown to have better performance than other fundamental statistical distributions.
Abstract: In the present paper, we introduce a new lifetime distribution based on the general odd hyperbolic cosine-FG model. Some important properties of proposed model including survival function, quantile function, hazard function, order statistic are obtained. In addition estimating unknown parameters of this model will be examined from the perspective of classic and Bayesian statistics. Moreover, an example of real data set is studied; point and interval estimations of all parameters are obtained by maximum likelihood, bootstrap (parametric and non-parametric) and Bayesian procedures. Finally, the superiority of proposed model in terms of parent exponential distribution over other fundamental statistical distributions is shown via the example of real observations.

Journal ArticleDOI
TL;DR: A multi-objective inventory model under both stock-dependent demand rate and holding cost rate with fuzzy random coefficients is investigated to determine optimal order quantity and inventory level such that the total profit and wastage cost are maximized and minimize for the retailer respectively.
Abstract: In this paper, we investigated a multi-objective inventory model under both stock-dependent demand rate and holding cost rate with fuzzy random coefficients. Chance constrained fuzzy random multi-objective model and a traditional solution procedure based on an interactive fuzzy satisfying method are discussed. In addition, the technique of fuzzy random simulation is applied to deal with general fuzzy random objective functions and fuzzy random constraints which are usually difficult to converted into their crisp equivalents. The purposed of this study is to determine optimal order quantity and inventory level such that the total profit and wastage cost are maximized and minimize for the retailer respectively. Finally, illustrate example is given in order to show the application of the proposed model.

Journal ArticleDOI
TL;DR: The MLEs and corresponding Bayes estimators are compared in terms of their risks based on simulated samples from Rayleigh distribution and two sets of real data are analyzed to show its applicability.
Abstract: In this paper, we propose maximum likelihood estimators (MLEs) and Bayes estimators of parameters of the step-stress partially accelerated life testing of Rayleigh distribution in presence of progressive type-II censoring with binomial removal scheme under Square error loss function, General entropy loss function and Linear exponential loss function . The MLEs and corresponding Bayes estimators are compared in terms of their risks based on simulated samples from Rayleigh distribution. Also, we present to analyze two sets of real data to show its applicability.

Journal ArticleDOI
TL;DR: In this paper, the authors analyzed the determinant factors of Ethiopia's coffee exports (ECE) performance, in the dimension of export sales, via a more realistic model application, dynamic panel gravity model.
Abstract: Ethiopia’s coffee export earning percentage share in the total export has been rapidly waning over the last decades while it is the first commodity in currency grossing of the country. Since, this study analyses the determinant factors of Ethiopia’s coffee exports (ECE) performance, in the dimension of export sales, via a more realistic model application, dynamic panel gravity model. It commences with the disintegration of the determinant into supply- and demand-side factors. It used short panel data that comprise 71 countries of consistent Ethiopia’s coffee importers for the period of 11 years from 2005 to 2015. The panel unit root test of Harris–Tzavalis was made for each variable and applied the first difference transformation for the variables that had a unit root. The system model of a linear dynamic panel gravity model was specified and estimated with two-step general method moment estimation approach. The model results suggested that lagged ECE performance, real gross domestic product (GDP) of importing countries, Ethiopian population, Ethiopian real GDP, openness to trade of importing countries, Ethiopian institutional quality, and weighted distance were found to be the determinant factors of Ethiopia’s coffee exports performance. The study also implied policies that would promote institutional quality or permits favorable market environments, supply capacity, trade liberalization, and destination with relatively cheaper transportation costs in order to progress Ethiopia’s coffee exports performance.

Journal ArticleDOI
TL;DR: An improved LDA topic model based on partition (LDAP) is proposed, which preserves the benefits of the original LDA but also refines the modeled granularity from the document level to the semantic topic level, which is particularly suitable for the topic modeling of the medium and long text.
Abstract: Latent Dirichlet Allocation (LDA) is a topic model that represents a document as a distribution of multiple topics. It expresses each topic as a distribution of multiple words by mining semantic relationships hidden in text. However, traditional LDA ignores some of the semantic features hidden inside the document semantic structure of medium and long texts. Instead of using the original LDA to model the topic at the document level, it is better to refine the document into different semantic topic units. In this paper, we propose an improved LDA topic model based on partition (LDAP) for medium and long texts. LDAP not only preserves the benefits of the original LDA but also refines the modeled granularity from the document level to the semantic topic level, which is particularly suitable for the topic modeling of the medium and long text. The extensive experimental classification results on Fudan University corpus and Sougou Lab corpus demonstrate that LDAP achieves better performance compared with other topic models, such as LDA, HDP, LSA and doc2vec.

Journal ArticleDOI
TL;DR: In this article, the authors used the method of the Marshall Olkin alpha power transformation to introduce a new generalized MOAPIE distribution and its characterization and statistical properties are obtained, such as reliability, entropy and order statistics.
Abstract: In this paper, we use the method of the Marshall Olkin alpha power transformation to introduce a new generalized Marshall Olkin alpha power inverse exponential (MOAPIE) distribution. Its characterization and statistical properties are obtained, such as reliability, entropy and order statistics. Moreover, the estimation of the MOAPIE parameters is discussed by using maximum likelihood estimation method. Finally, application of the proposed new distribution to a real data representing the survival times in days of guinea pigs injected with different doses of tubercle bacilli is given and its goodness-of-fit is demonstrated. In addition, comparisons to other models are carried out to illustrate the flexibility of the proposed model.

Journal ArticleDOI
TL;DR: In this paper, a new family of distributions called the exponentiated generalized power series family is proposed and studied and statistical properties such as stochastic order, quantile function, entropy, mean residual life and order statistics were derived.
Abstract: In this paper, a new family of distributions called the exponentiated generalized power series family is proposed and studied. Statistical properties such as stochastic order, quantile function, entropy, mean residual life and order statistics were derived. Bivariate and multivariate extensions of the family was proposed. The method of maximum likelihood estimation was proposed for the estimation of the parameters. Some special distributions from the family were defined and their applications were demonstrated with real data sets.

Journal ArticleDOI
TL;DR: In this article, a more detailed statistical analysis of the dependence across Nigeria inflation, exchange rate, and stock market returns is provided by means of copulas, and a positive relationship is found to exist between Nigeria inflation and the exchange rate of Nigeria Naira versus USD.
Abstract: For the first time, a more detailed statistical analysis of the dependence across Nigeria inflation, exchange rate, and stock market returns is provided by means of copulas. A positive relationship is found to exist between Nigeria inflation and the exchange rate of Nigeria Naira versus USD, a negligible positive relationship exists between Nigeria inflation and her stock market returns, and a weak positive relationship exists between the exchange rate of Nigeria Naira versus USD and her stock market returns. Eighteen months forecast for each of the time series and the value at risk estimates for the Nigeria stock market returns are given. The Nigeria stock market is confirmed to be weak form inefficient.

Journal ArticleDOI
TL;DR: A Markov Chain Monte Carlo method is presented to obtain the posterior summaries of the Chen distribution assuming upper record values and a comparison between the Bayesian and frequentist approaches is given.
Abstract: This article presents the Bayesian and classical inferences for the Chen distribution assuming upper record values. As the posterior distribution is not in a closed form, a Markov Chain Monte Carlo method is presented to obtain the posterior summaries. To assess the effect of prior on the estimated parameters, sensitivity analysis is also a part of this study. Moreover, a comparison between the Bayesian and frequentist approaches is also given. Besides the simulation studies, a real data example to show the application of the study is also discussed.

Journal ArticleDOI
TL;DR: In this paper, the expected values, second moments, variances and covariances of order statistics from samples of sizes up to 10 for various values of the parameters were tabulated, and the best linear unbiased estimates of the location and scale parameters based on Type-II right-censored samples were obtained.
Abstract: Power Lindley distribution has been proposed recently by Ghitany et al. (Comput Stat Data Anal 64:20–33, 2013) as a simple and useful reliability model for analysing lifetime data. This model provides more flexibility than the Lindley distribution in terms of the shape of the density and hazard rate functions as well as its skewness and kurtosis. For this distribution, exact explicit expressions for single moments, product moments, marginal moment generating functions and joint moment generating functions of each of these order statistics are derived. By using these relations, we have tabulated the expected values, second moments, variances and covariances of order statistics from samples of sizes up to 10 for various values of the parameters. In addition, we use these moments to obtain the best linear unbiased estimates of the location and scale parameters based on Type-II right-censored samples. In addition, we carry out some numerical illustrations through Monte Carlo simulations to show the usefulness of the findings. Finally, we apply the findings of the paper to some real data set.

Journal ArticleDOI
TL;DR: In this paper, a new family of distributions, called generalized Burr XII power series class, was defined and studied by compounding the generalized Burr 12 and power series distributions, and the maximum likelihood estimation method was used to estimate the model parameters.
Abstract: We define and study a new family of distributions, called generalized Burr XII power series class, by compounding the generalized Burr XII and power series distributions. Several properties of the new family are derived. The maximum likelihood estimation method is used to estimate the model parameters. The importance and potentiality of the new family are illustrated by means of three applications to real data sets.

Journal ArticleDOI
TL;DR: In this paper, the authors introduced a new lifetime distribution, transmuted extended exponential distribution (SEED), which generalizes the SEED with an additional parameter using the quadratic rank transmutation map.
Abstract: We introduce a new lifetime distribution namely, transmuted extended exponential distribution which generalizes the extended exponential distribution proposed by Nadarajah and Haghighi (Statistics 45:543–558, 2011) with an additional parameter using the quadratic rank transmutation map which was studied by Shaw and Buckley (The alchemy of probability distributions: beyond Gram-Charlier expansions, and a skew-kurtotic-normal distribution from a rank transmutation map, 2009. arXiv:0901.0434) to provide greater flexibility in modeling data from a practical point of view. In this paper, our main focus is on estimation from frequentist point of view, yet, some statistical and reliability characteristics for the model are derived. We briefly describe different estimation procedures namely, the method of maximum likelihood estimation, maximum product of spacings estimation and least square estimation. Monte Carlo simulations are performed to compare the performance of the proposed methods of estimations for both small and large samples. Finally, the potentiality of the model is analyzed by means of one real data set.

Journal ArticleDOI
TL;DR: This research provides an effective solution through crime ontologies and an enhanced ant-based crawler to extract the characteristics and relationships among Web pages for the recreation and extraction of crime scenarios.
Abstract: Crime analysis is one of the important activities in information security agencies. They collect the crimes data with appropriate procedures and tools from the Web. The main challenge which many of these agencies are facing is to have an efficient and accurate analysis of the increasing rate of crime information. The cybercrime information presented on Web pages are in the form of text and need to be analyzed and investigated. Although some approaches have been presented to support Web crime mining, the issues of efficiency and effectiveness still exist. Due to the fact that most of the crime information is based on Web ontology, semantic technology can be used to study the patterns and the process of Web crimes. Therefore, in order to extract and reveal the Internet crime, an improved Web ontology is useful to extract the characteristics and relationships among Web pages for the recreation and extraction of crime scenarios. The main purpose of this study is to develop an optimized ontology-based approach for Web crime mining. The proposed framework was designed based on enhanced crime ontology using ant-miner focused crawler, which drew inspiration from biological researches on the ant foraging behavior. Ant-colony optimization was used to optimize the proposed framework. The proposed work was evaluated based on accuracy criteria. The evaluation results show that this research provides an effective solution through crime ontologies and an enhanced ant-based crawler.