scispace - formally typeset
Search or ask a question
JournalISSN: 1680-743X

Journal of data science 

People's University of China
About: Journal of data science is an academic journal published by People's University of China. The journal publishes majorly in the area(s): Computer science & Cluster analysis. It has an ISSN identifier of 1680-743X. Over the lifetime, 1215 publications have been published receiving 12889 citations. The journal is also known as: JDS (Online) & JDS (Print).


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the performance of Singular Spectrum Analysis (SSA) has been considered by applying it to a well-known time series data set, namely, monthly accidental deaths in the USA.
Abstract: In recent years Singular Spectrum Analysis (SSA), used as a powerful technique in time series analysis, has been developed and applied to many practical problems. In this paper, the performance of the SSA tech- nique has been considered by applying it to a well-known time series data set, namely, monthly accidental deaths in the USA. The results are com- pared with those obtained using Box-Jenkins SARIMA models, the ARAR algorithm and the Holt-Winter algorithm (as described in Brockwell and Davis (2002)). The results show that the SSA technique gives a much more accurate forecast than the other methods indicated above.

452 citations

Journal ArticleDOI
TL;DR: The Weibull distribution is the most important distribution for problems in reliability as discussed by the authors, and it has been studied extensively in the literature, including in the context of the wider Weibbull-G family of distributions.
Abstract: The Weibull distribution is the most important distribution for problems in reliability. We study some mathematical properties of the new wider Weibull-G family of distributions. Some special models in the new family are discussed. The properties derived hold to any distribution in this family. We obtain general explicit expressions for the quantile function, ordinary and incomplete moments, generating function and order statistics. We discuss the estimation of the model parameters by maximum likelihood and illustrate the potentiality of the extended family with two applications to real data.

391 citations

Journal ArticleDOI
TL;DR: A modification of the Greedy Equivalence Search algorithm to rapidly find the Markov Blanket of any variable in a high dimensional system is described.
Abstract: We describe two modifications that parallelize and reorganize caching in the well-known Greedy Equivalence Search (GES) algorithm for discovering directed acyclic graphs on random variables from sample values. We apply one of these modifications, the Fast Greedy Search (FGS) assuming faithfulness, to an i.i.d. sample of 1,000 units to recover with high precision and good recall an average degree 2 directed acyclic graph (DAG) with one million Gaussian variables. We describe a modification of the algorithm to rapidly find the Markov Blanket of any variable in a high dimensional system. Using 51,000 voxels that parcellate an entire human cortex, we apply the FGS algorithm to Blood Oxygenation Level Dependent (BOLD) time series obtained from resting state fMRI.

247 citations

Journal ArticleDOI
TL;DR: This review shows what Apache Spark has for designing and implementing big data algorithms and pipelines for machine learning, graph analysis and stream processing and highlights some research and development directions on Apache Spark for big data analytics.
Abstract: Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming and structured data processing. It is a general-purpose cluster computing framework with language-integrated APIs in Scala, Java, Python and R. As a rapidly evolving open source project, with an increasing number of contributors from both academia and industry, it is difficult for researchers to comprehend the full body of development and research behind Apache Spark, especially those who are beginners in this area. In this paper, we present a technical review on big data analytics using Apache Spark. This review focuses on the key components, abstractions and features of Apache Spark. More specifically, it shows what Apache Spark has for designing and implementing big data algorithms and pipelines for machine learning, graph analysis and stream processing. In addition, we highlight some research and development directions on Apache Spark for big data analytics.

241 citations

Journal ArticleDOI
TL;DR: In this article, a zero-inflated generalized Poisson (ZIGP) regression model was proposed to model domestic violence data with too many zeros, which is a good competitor to the negative binomial re-gression model when the count data is over-dispersed.
Abstract: The generalized Poisson regression model has been used to model dispersed count data. It is a good competitor to the negative binomial re- gression model when the count data is over-dispersed. Zero-inflated Poisson and zero-inflated negative binomial regression models have been proposed for the situations where the data generating process results into too many zeros. In this paper, we propose a zero-inflated generalized Poisson (ZIGP) regression model to model domestic violence data with too many zeros. Es- timation of the model parameters using the method of maximum likelihood is provided. A score test is presented to test whether the number of zeros is too large for the generalized Poisson model to adequately fit the domestic violence data.

229 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202325
202253
2021665
2020101
201982
201887