Journal•ISSN: 1680-743X

Journal of data science

People's University of China

About: Journal of data science is an academic journal published by People's University of China. The journal publishes majorly in the area(s): Computer science & Cluster analysis. It has an ISSN identifier of 1680-743X. Over the lifetime, 1215 publications have been published receiving 12889 citations. The journal is also known as: JDS (Online) & JDS (Print).

...read moreread less

Topics: Computer science, Cluster analysis, Estimator, Population, Regression analysis ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Singular Spectrum Analysis: Methodology and Comparison

[...]

Hossein Hassani

12 Jul 2021-Journal of data science

TL;DR: In this paper, the performance of Singular Spectrum Analysis (SSA) has been considered by applying it to a well-known time series data set, namely, monthly accidental deaths in the USA.

...read moreread less

Abstract: In recent years Singular Spectrum Analysis (SSA), used as a powerful technique in time series analysis, has been developed and applied to many practical problems. In this paper, the performance of the SSA tech- nique has been considered by applying it to a well-known time series data set, namely, monthly accidental deaths in the USA. The results are com- pared with those obtained using Box-Jenkins SARIMA models, the ARAR algorithm and the Holt-Winter algorithm (as described in Brockwell and Davis (2002)). The results show that the SSA technique gives a much more accurate forecast than the other methods indicated above.

...read moreread less

452 citations

Journal Article•DOI•

The Weibull-G Family of Probability Distributions

[...]

Marcelo Bourguignon, Rodrigo B. Silva, Gauss M. Cordeiro

09 Mar 2021-Journal of data science

TL;DR: The Weibull distribution is the most important distribution for problems in reliability as discussed by the authors, and it has been studied extensively in the literature, including in the context of the wider Weibbull-G family of distributions.

...read moreread less

Abstract: The Weibull distribution is the most important distribution for problems in reliability. We study some mathematical properties of the new wider Weibull-G family of distributions. Some special models in the new family are discussed. The properties derived hold to any distribution in this family. We obtain general explicit expressions for the quantile function, ordinary and incomplete moments, generating function and order statistics. We discuss the estimation of the model parameters by maximum likelihood and illustrate the potentiality of the extended family with two applications to real data.

...read moreread less

391 citations

Journal Article•DOI•

A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images

[...]

Joseph D. Ramsey¹, Madelyn R. K. Glymour¹, Ruben Sanchez-Romero¹, Clark Glymour¹•Institutions (1)

Carnegie Mellon University¹

01 Mar 2017-Journal of data science

TL;DR: A modification of the Greedy Equivalence Search algorithm to rapidly find the Markov Blanket of any variable in a high dimensional system is described.

...read moreread less

Abstract: We describe two modifications that parallelize and reorganize caching in the well-known Greedy Equivalence Search (GES) algorithm for discovering directed acyclic graphs on random variables from sample values. We apply one of these modifications, the Fast Greedy Search (FGS) assuming faithfulness, to an i.i.d. sample of 1,000 units to recover with high precision and good recall an average degree 2 directed acyclic graph (DAG) with one million Gaussian variables. We describe a modification of the algorithm to rapidly find the Markov Blanket of any variable in a high dimensional system. Using 51,000 voxels that parcellate an entire human cortex, we apply the FGS algorithm to Blood Oxygenation Level Dependent (BOLD) time series obtained from resting state fMRI.

...read moreread less

247 citations

Journal Article•DOI•

Big data analytics on Apache Spark

[...]

Salman Salloum¹, Ruslan Dautov¹, Xiaojun Chen¹, Patrick Xiaogang Peng¹, Joshua Zhexue Huang¹ - Show less +1 more•Institutions (1)

Shenzhen University¹

13 Oct 2016-Journal of data science

TL;DR: This review shows what Apache Spark has for designing and implementing big data algorithms and pipelines for machine learning, graph analysis and stream processing and highlights some research and development directions on Apache Spark for big data analytics.

...read moreread less

Abstract: Apache Spark has emerged as the de facto framework for big data analytics with its advanced in-memory programming model and upper-level libraries for scalable machine learning, graph analysis, streaming and structured data processing. It is a general-purpose cluster computing framework with language-integrated APIs in Scala, Java, Python and R. As a rapidly evolving open source project, with an increasing number of contributors from both academia and industry, it is difficult for researchers to comprehend the full body of development and research behind Apache Spark, especially those who are beginners in this area. In this paper, we present a technical review on big data analytics using Apache Spark. This review focuses on the key components, abstractions and features of Apache Spark. More specifically, it shows what Apache Spark has for designing and implementing big data algorithms and pipelines for machine learning, graph analysis and stream processing. In addition, we highlight some research and development directions on Apache Spark for big data analytics.

...read moreread less

241 citations

Journal Article•DOI•

Zero-Inflated Generalized Poisson Regression Model with an Application to Domestic Violence Data

[...]

Felix Famoye, Karan P. Singh

13 Jul 2021-Journal of data science

TL;DR: In this article, a zero-inflated generalized Poisson (ZIGP) regression model was proposed to model domestic violence data with too many zeros, which is a good competitor to the negative binomial re-gression model when the count data is over-dispersed.

...read moreread less

Abstract: The generalized Poisson regression model has been used to model dispersed count data. It is a good competitor to the negative binomial re- gression model when the count data is over-dispersed. Zero-inflated Poisson and zero-inflated negative binomial regression models have been proposed for the situations where the data generating process results into too many zeros. In this paper, we propose a zero-inflated generalized Poisson (ZIGP) regression model to model domestic violence data with too many zeros. Es- timation of the model parameters using the method of maximum likelihood is provided. A score test is presented to test whether the number of zeros is too large for the generalized Poisson model to adequately fit the domestic violence data.

...read moreread less

229 citations

Collapse

Performance

Metrics

1,230

Papers

12,892

Citations

No. of papers from the Journal in previous years
Year	Papers
2023	25
2022	53
2021	665
2020	101
2019	82
2018	87