Showing papers in "Open Journal of Statistics in 2014"
••
Abstract: Psychometric
theory requires unidimensionality (i.e.,
scale items should represent a common latent variable). One advocated approach
to test unidimensionality within the Rasch model is to identify two item sets
from a Principal Component Analysis (PCA) of residuals, estimate separate
person measures based on the two item sets, compare the two estimates on a
person-by-person basis using t-tests
and determine the number of cases that differ significantly at the 0.05-level;
if ≤5% of tests are significant, or the lower bound of a binomial 95%
confidence interval (CI) of the observed proportion overlaps 5%, then it is
suggested that strict unidimensionality can be inferred; otherwise the scale is
multidimensional. Given its proposed significance and potential implications,
this procedure needs detailed scrutiny. This paper explores the impact of
sample size and method of estimating the 95% binomial CI upon conclusions
according to recommended conventions. Normal approximation, “exact”, Wilson,
Agresti-Coull, and Jeffreys binomial CIs were calculated for observed
proportions of 0.06, 0.08 and 0.10 and sample sizes from n= 100 to n= 2500.
Lower 95%CI boundaries were inspected regarding coverage of the 5% threshold.
Results showed that all binomial 95% CIs included as well as excluded 5% as an
effect of sample size for all three investigated proportions, except for the
Wilson, Agresti-Coull, and JeffreysCIs, which did not include 5% for any sample
size with a 10% observed proportion. The normal approximation CI was most
sensitive to sample size. These data illustrate that the PCA/t-test protocol should be used and
interpreted as any hypothesis testing procedure and is dependent on sample size
as well as binomial CI estimation procedure. The PCA/t-test protocol should not be viewed as a “definite” test of
unidimensionality and does not replace an integrated quantitative/qualitative
interpretation based on an explicit variable definition in view of the
perspective, context and purpose of measurement.
74 citations
••
TL;DR: In this article, the authors evaluated the performance of the four most commonly used methods in practice, namely, complete case (CC), mean substitution (MS), last observation carried forward (LOCF), and multiple imputation (MI), and concluded that MI is more reliable and a better grounded statistical method to be used under MAR.
Abstract: Missing data can frequently occur in a longitudinal data analysis. In the literature, many methods have been proposed to handle such an issue. Complete case (CC), mean substitution (MS), last observation carried forward (LOCF), and multiple imputation (MI) are the four most frequently used methods in practice. In a real-world data analysis, the missing data can be MCAR, MAR, or MNAR depending on the reasons that lead to data missing. In this paper, simulations under various situations (including missing mechanisms, missing rates, and slope sizes) were conducted to evaluate the performance of the four methods considered using bias, RMSE, and 95% coverage probability as evaluation criteria. The results showed that LOCF has the largest bias and the poorest 95% coverage probability in most cases under both MAR and MCAR missing mechanisms. Hence, LOCF should not be used in a longitudinal data analysis. Under MCAR missing mechanism, CC and MI method are performed equally well. Under MAR missing mechanism, MI has the smallest bias, smallest RMSE, and best 95% coverage probability. Therefore, CC or MI method is the appropriate method to be used under MCAR while MI method is a more reliable and a better grounded statistical method to be used under MAR.
33 citations
••
TL;DR: In this article, it was shown that most of the empirical or semi-empirical isotherms proposed to extend the Langmuir formula to sorption (adsorption, chimisorption and biosorption) on heterogeneous surfaces in the gaseous and liquid phase belong to the family and subfamily of the BurrXII cumulative distribution functions.
Abstract: We show that most of the empirical or semi-empirical isotherms proposed to extend the Langmuir formula to sorption (adsorption, chimisorption and biosorption) on heterogeneous surfaces in the gaseous and liquid phase belong to the family and subfamily of the BurrXII cumulative distribution functions. As a consequence they obey relatively simple differential equations which describe birth and death phenomena resulting from mesoscopic and microscopic physicochemical processes. Using the probability theory, it is thus possible to give a physical meaning to their empirical coefficients, to calculate well defined quantities and to compare the results obtained from different isotherms. Another interesting consequence of this finding is that it is possible to relate the shape of the isotherm to the distribution of sorption energies which we have calculated for each isotherm. In particular, we show that the energy distribution corresponding to the Brouers-Sotolongo (BS) isotherm [1] is the Gumbel extreme value distribution. We propose a generalized GBS isotherm, calculate its relevant statistical properties and recover all the previous results by giving well defined values to its coefficients. Finally we show that the Langmuir, the Hill-Sips, the BS and GBS isotherms satisfy the maximum Bolzmann-Shannon entropy principle and therefore should be favoured.
28 citations
••
TL;DR: In this paper, a maximum ranked set sampling procedure with unequal samples (MRSSU) is proposed and its properties are studied under exponential distribution under both perfect and imperfect ranking (with errors in ranking).
Abstract: In this paper maximum ranked set sampling procedure with unequal samples (MRSSU) is proposed. Maximum likelihood estimator and modified maximum likelihood estimator are obtained and their properties are studied under exponential distribution. These methods are studied under both perfect and imperfect ranking (with errors in ranking). These estimators are then compared with estimators based on simple random sampling (SRS) and ranked set sampling (RSS) procedures. It is shown that relative efficiencies of the estimators based on MRSSU are better than those of the estimator based on SRS. Simulation results show that efficiency of proposed estimator is better than estimator based on RSS under ranking error.
20 citations
••
TL;DR: In this paper, the authors compare sample quality across two probability samples and one that uses probabilistic cluster sampling combined with random route and quota sampling within the selected clusters in order to define the ultimate survey.
Abstract: The aim of
this paper is to compare sample quality across two probability samples and one
that uses probabilistic cluster sampling combined with random route and quota
sampling within the selected clusters in order to define the ultimate survey
units. All of them use the face-to-face interview as the survey procedure. The
hypothesis to be tested is that it is possible to achieve the same degree of representativeness
using a combination of random route sampling and quota sampling (with
substitution) as it can be
achieved by means of household sampling (without substitution) based on the
municipal register of inhabitants. We have found such marked differences in the
age and gender distribution of the probability sampling, where the deviations
exceed 6%. A different picture emerges when it comes to comparing the
employment variables, where the quota sampling overestimates the economic
activity rate (2.5%) and the unemployment rate (8%) and underestimates the
employment rate (3.46%).
20 citations
••
TL;DR: In this article, the effect of item inversion on the construct validity and reliability of psychometric scales and proposed a theoretical framework for the evaluation of the psychometric properties of data gathered with psychometric instruments.
Abstract: This study
evaluated the effect of item inversion on the construct validity and
reliability of psychometric scales and proposed a theoretical framework for the
evaluation of the psychometric properties of data gathered with psychometric
instruments. To this propose, we used the Maslach Burnout Inventory, which is
the most used psychometric inventory to measure burnout in different
professional context (Students, Teachers, Police, Doctors, Nurses, etc…). The
version of the MBI used was the MBI-Student Survey (MBI-SS). This inventory is
composed of three key dimensions: Exhaustion, Cynicism and Professional
Efficacy. The two first dimensions—which have positive formulated items—are
moderate to strong positive correlated, and show moderate to strong negative
correlations with the 3rd dimension—which has negative formulated items. We
tested the hypothesis that, in college students, formulating the 3rd dimension
of burnout as Inefficacy (reverting the
negatively worded items in the Efficacy dimension) improves the
correlation of the 3rd dimension with the other two dimensions, improves its
internal consistency, and the overall MBI-SS’ construct validity and
reliability. Confirmatory factor analysis results, estimated by Maximum
Likelihood, revealed adequate factorial fit for both forms of the MBI-SS (with
Efficacy) vs. the MBI-SSi (with Inefficacy). Also both forms showed adequate
convergent and discriminant related validity. However, reliability and
convergent validity were higher for the MBI-SSi. There were also stronger
(positive) correlations between the 3 factors in MBI-SSi than the ones observed
in MBI-SS. Results show that positively rewording of the 3rd dimension of the
MBI-SS improves its validity and reliability. We therefore propose that the 3rd
dimension of the MBI-SS should be named Professional Inefficacy and its items
should be positively worded.
18 citations
••
TL;DR: A simulation study to investigate the efficiency of four typical imputation methods with longitudinal data setting under missing completely at random concludes that MI method is the most effective imputation method in the authors' MCAR simulation study.
Abstract: In analyzing data from
clinical trials and longitudinal studies, the issue of missing values is always
a fundamental challenge since the missing data could introduce bias and lead to
erroneous statistical inferences. To deal with this challenge, several imputation
methods have been developed in the literature to handle missing values where
the most commonly used are complete case method, mean imputation method, last
observation carried forward
(LOCF) method, and multiple imputation (MI) method. In this paper, we conduct a
simulation study to investigate
the efficiency of these four typical imputation methods with longitudinal data
setting under missing completely
at random (MCAR). We categorize missingness with three cases from a lower
percentage of 5% to a higher percentage of 30% and 50% missingness. With this
simulation study, we make a conclusion that LOCF method has more bias than the
other three methods in most situations. MI method has the least bias with the
best coverage probability. Thus, we conclude that MI method is the most
effective imputation method in our MCAR simulation study.
18 citations
••
TL;DR: In this paper, the exact expression of the distribution of the sample matrix of correlation R, with the sample variance acting as parameters, is given, for the case where the multivariate normal population does not have null correlations, and applications to the concept of system variance pendence in Reliability Theory are presented.
Abstract: For the
case where the multivariate normal population does not have null correlations,
we give the exact expression of the distribution of the sample matrix of
correlations R, with the sample
variances acting as parameters. Also, the distribution of its determinant is
established in terms of Meijer G-functions in the null-correlation case.
Several numerical examples are given, and applications to the concept of system
de- pendence in Reliability Theory are presented.
16 citations
••
TL;DR: In this paper, the ROC curves for Bi-Pareto and Bi-two parameter exponential distributions were calculated using simulations and compared in terms of root mean square and mean absolute errors.
Abstract: In this paper, we find the ROC curves for Bi-Pareto and Bi-two parameter
exponential distributions. Theoretical, parametric and non-parametric values of
area under receiver operating characteristic (AUROC) curve for different parametric
combinations have been calculated using simulations. These values are compared
in terms of root mean square and mean absolute errors. The results are
demonstrated for two real data sets.
13 citations
••
TL;DR: The opportunity of using the most innovative spatial sampling designs in business surveys, in order to produce samples that are well spread in space, is here tested by means of Monte Carlo experiments.
Abstract: An innovative use
of spatial sampling designs is here presented. Sampling methods which consider
spatial locations of statistical units are already used in agricultural and
environmental contexts, while they have never been exploited for establishment
surveys. However, the rapidly increasing availability of geo- referenced
information about business units makes that possible. In business studies, it
may indeed be important to take into account the presence of spatial
autocorrelation or spatial trends in the variables of interest, in order to
have more precise and efficient estimates. The opportunity of using the most
innovative spatial sampling designs in business surveys, in order to produce
samples that are well spread in space, is here tested by means of Monte Carlo experiments.
For all designs, the Horvitz-Thompson estimator of the population total is used
both with equal and unequal inclusion probabilities. The efficiency of sampling
designs is evaluated in terms of relative RMSE and efficiency gain compared
with designs ignoring the spatial information. Furthermore, an evaluation of
spatially balancing samples is also conducted.
13 citations
••
TL;DR: An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition, and k-nearest neighbors yield the highest average test accuracy.
Abstract: An algorithm involving Mel-Frequency Cepstral Coefficients (MFCCs) is provided to perform signal feature extraction for the task of speaker accent recognition. Then different classifiers are compared based on the MFCC feature. For each signal, the mean vector of MFCC matrix is used as an input vector for pattern recognition. A sample of 330 signals, containing 165 US voice and 165 non-US voice, is analyzed. By comparison, k-nearest neighbors yield the highest average test accuracy, after using a cross-validation of size 500, and least time being used in the computation.
••
TL;DR: Findings of the study suggest that this simulation-based power analysis method can be used to estimate sample size and statistical power for Guastello’s polynomial regression method in cusp catastrophe modeling.
Abstract: Guastello’s polynomial regression method for solving cusp catastrophe model has been widely applied to analyze nonlinear behavior outcomes. However, no statistical power analysis for this modeling approach has been reported probably due to the complex nature of the cusp catastrophe model. Since statistical power analysis is essential for research design, we propose a novel method in this paper to fill in the gap. The method is simulation-based and can be used to calculate statistical power and sample size when Guastello’s polynomial regression method is used to do cusp catastrophe modeling analysis. With this novel approach, a power curve is produced first to depict the relationship between statistical power and samples size under different model specifications. This power curve is then used to determine sample size required for specified statistical power. We verify the method first through four scenarios generated through Monte Carlo simulations, and followed by an application of the method with real published data in modeling early sexual initiation among young adolescents. Findings of our study suggest that this simulation-based power analysis method can be used to estimate sample size and statistical power for Guastello’s polynomial regression method in cusp catastrophe modeling.
••
TL;DR: In this article, the shape parameter of Weibull distribution is used to calculate PCIs for the verification and validation purpose of two data sets for verification purpose and the effectiveness of the technique is assessed by bootstrapping the results of estimate and standard error of shape parameter.
Abstract: Process capability analysis is used to determine the process performance as capable or incapable within a specified tolerance. Basic indices Cp, Cpk, Cpm, Cpmk initially developed for normally distributed processes showed inappropriate for processes with non-normal distributions. A number of authors worked on non-normal distributions which were most notably those of Clements, Pearn and Chen, Montgomery and Johnson-Kotz-Pearn (JKP). Obtaining PCIs based on the parameters of non-normal distributions are completely disregarded and ignored. However parameters of some non-normal distributions have significance for knowing the status of process as capable or incapable. In this article we intend to work on the shape parameter of Weibull distribution to calculate PCIs. We work on two data sets for verification and validation purpose. Efficacy of the technique is assessed by bootstrapping the results of estimate and standard error of shape parameter.
••
TL;DR: In this paper, the authors proposed a method for estimating the duration of the hiatus that is robust to unknown forms of heteroskedasticity and autocorrelation (HAC) in the temperature series and to cherry-picking of endpoints.
Abstract: The IPCC has drawn attention to an apparent
leveling-off of globally-averaged temperatures over the past 15 years or so.
Measuring the duration of the hiatus has implications for determining if the
underlying trend has changed, and for evaluating climate models. Here, I
propose a method for estimating the duration of the hiatus that is robust to
unknown forms of heteroskedasticity and autocorrelation (HAC) in the
temperature series and to cherry-picking of endpoints. For the specific case of
global average temperatures I also add the requirement of spatial consistency between
hemispheres. The method makes use of the Vogelsang-Franses (2005) HAC-robust
trend variance estimator which is valid as long as the underlying series is
trend stationary, which is the case for the data used herein. Application of
the method shows that there is now a trendless interval of 19 years duration at
the end of the HadCRUT4 surface temperature series, and of 16 - 26 years in the
lower troposphere. Use of a simple AR1 trend model suggests a shorter hiatus of
14 - 20 years but is likely unreliable.
••
TL;DR: In this paper, a new nonparametric test based on the rank difference between the paired sample for testing the equality of the marginal distributions from a bivariate distribution was proposed, which has comparable power to the paired t test for the data simulated from bivariate normal distributions.
Abstract: We propose a new nonparametric test based on the
rank difference between the paired sample for testing the equality of the
marginal distributions from a bivariate distribution. We also consider a
modification of the novel nonparametric test based on the test proposed by Baumgartern,
Weiβ, and Schindler (1998). An
extensive numerical power comparison for various parametric and nonparametric
tests was conducted under a wide range of bivariate distributions for small
sample sizes. The two new nonparametric tests have comparable power to the
paired t test for the data simulated from bivariate normal distributions, and
are generally more powerful than the paired t test and other commonly used
nonparametric tests in several important bivariate distributions.
••
TL;DR: In this article, the Weibull kernel is used to estimate the hazard rate and the probability density function for independent and identically distributed (iid) data, and the performance of the proposed estimator is tested using simulation study and real data.
Abstract: In this paper, we define the Weibull kernel and use it to nonparametric estimation of the probability density function (pdf) and the hazard rate function for independent and identically distributed (iid) data. The bias, variance and the optimal bandwidth of the proposed estimator are investigated. Moreover, the asymptotic normality of the proposed estimator is investigated. The performance of the proposed estimator is tested using simulation study and real data.
••
TL;DR: In this article, the authors focus on the detection and estimation of changes in patients' failure rates, which is important for the evaluation and comparison of treatments and prediction of their effects.
Abstract: Effects of many medical procedures appear after a time lag, when a significant change occurs in subjects’ failure rate. This paper focuses on the detection and estimation of such changes which is important for the evaluation and comparison of treatments and prediction of their effects. Unlike the classical change-point model, measurements may still be identically distributed, and the change point is a parameter of their common survival function. Some of the classical change-point detection techniques can still be used but the results are different. Contrary to the classical model, the maximum likelihood estimator of a change point appears consistent, even in presence of nuisance parameters. However, a more efficient procedure can be derived from Kaplan-Meier estimation of the survival function followed by the least-squares estimation of the change point. Strong consistency of these estimation schemes is proved. The finite-sample properties are examined by a Monte Carlo study. Proposed methods are applied to a recent clinical trial of the treatment program for strong drug dependence.
••
TL;DR: In this paper, the authors used principal component regression (PCR) to determine the time lag of GCM data and build statistical downscaling model using PCR method with time lag.
Abstract: Statistical downscaling (SD) analyzes relationship between local-scale response and global-scale predictors. The SD model can be used to forecast rainfall (local-scale) using global-scale precipitation from global circulation model output (GCM). The objectives of this research were to determine the time lag of GCM data and build SD model using PCR method with time lag of the GCM precipitation data. The observations of rainfall data in Indramayu were taken from 1979 to 2007 showing similar patterns with GCM data on 1st grid to 64th grid after time shift (time lag). The time lag was determined using the cross-correlation function. However, GCM data of 64 grids showed multicollinearity problem. This problem was solved by principal component regression (PCR), but the PCR model resulted heterogeneous errors. PCR model was modified to overcome the errors with adding dummy variables to the model. Dummy variables were determined based on partial least squares regression (PLSR). The PCR model with dummy variables improved the rainfall prediction. The SD model with lag-GCM predictors was also better than SD model without lag-GCM.
••
TL;DR: In this paper, the authors used the Cox proportional hazard test to determine the appropriate method for modeling the birth of the first child in Indonesia, considering that newly married couples tend to have desire for having a baby as soon as possible and such desire will be weakened by increasing age of marriage.
Abstract: First birth interval is one of the examples of survival data. One of the characteristics of survival data is its observation period that is fully unobservable or censored. Analyzing the censored data using ordinary methods will lead to bias, so that reducing such bias required a certain method called survival analysis. There are two methods used in survival analysis that are parametric and non-parametric method. The objective of this paper is to determine the appropriate method for modeling the birth of the first child. The exponential model with the inclusion of covariates is used as parametric method, considering that the newly married couples tend to have desire for having baby as soon as possible and such desire will be weakened by increasing age of marriage. The data that will be analyzed were taken from the Indonesia Demographic and Health Survey (IDHS) 2012. The result of data analysis shows that the birth of the first child data is not exponentially distributed thus the Cox proportional hazard method is used. Because of the suspicion that disproportional covariate exists, then the proportional hazard test is conducted to show that the covariate of age is not proportional, the generalized Cox proportional method is used, namely Cox extended that allows the inclusion of disproportional covariates. The result of analysis using Cox extended model indicates that the factors affecting the birth of the first child in Indonesia are the area of residence, educational history and its age.
••
TL;DR: An inference framework on the modality of a KDE under multivariate setting using Gaussian kernel is developed and the modal clustering method proposed by [1] for mode hunting is applied.
Abstract: The number of modes
(also known as modality) of a kernel density estimator (KDE) draws lots of
interests and is important in practice. In this paper, we develop an inference
framework on the modality of a KDE under multivariate setting using Gaussian
kernel. We applied the modal clustering method proposed by [1] for mode hunting. A test statistic and its
asymptotic distribution are derived to assess the significance of each mode.
The inference procedure is applied on both simulated and real data sets.
••
TL;DR: In this article, two different tools to evaluate quantile regression and predictions are proposed: MAD, to summarize forecast errors, and a fluctuation test to evaluate in-sample predictions.
Abstract: Two different tools to evaluate quantile regression
forecasts are proposed: MAD, to summarize forecast errors, and a fluctuation
test to evaluate in-sample predictions. The scores of the PISA test to evaluate
students’ proficiency are considered. Growth analysis relates school attainment
to economic growth. The analysis is complemented by investigating the estimated
regression and predictions not only at the centre but also in the tails. For
out-of-sample forecasts, the estimates in one wave are employed to forecast the
following waves. The reliability of in-sample forecasts is controlled by
excluding the part of the sample selected by a specific rule: boys to predict
girls, public schools to forecast private ones, vocational schools to predict
non-vocational, etc. The gradient computed in the subset is compared to its
analogue computed in the full sample in order to verify the validity of the
estimated equation and thus of the in-sample predictions.
••
TL;DR: In this paper, the analysis of market sentiments in exchange rates is discussed, which are of great interest to trading individuals and institutional investors, and a multinomial probability model is built to capture the uncertainties in market sentiments.
Abstract: The paper deals with the analysis of market sentiments in exchange rates
which are of great interest to trading individuals and institutional investors.
For example, an institutional investor or a trading individual makes better investments and
minimizes losses when equipped with an understanding of market sentiments in
weekly or monthly exchange returns. In the approach suggested here, a typical
market sentiment is defined on the basis of the certain function of the mean
and the standard error of the logarithm of the ratio of successive daily
exchange rates. Based on this surmise, the market sentiments are classified
into various states, whereby states are defined according to the perceptions of
the market player. A multinomial probability model is built to capture the uncertainties in market
sentiments. Two asymptotically distribution-free tests, namely the chi-square and the likelihood ratio test of
goodness of fit for the hypothesis of the symmetry in market sentiments are
suggested. Two different measures of market sentiments are proposed. The
approach advocated here will be of interest to researchers, exchange rate
traders and financial analysts. As an application of the proposed line of
approach, we analyze weekly market sentiments that govern exchange rates of the
major global currencies—EUR, GBP, SDR, YEN,
ZAR, USD, data from 2001-2012. Some interesting conclusions are revealed based
on the data analysis.
••
TL;DR: In this article, two extensions of the stochastic logistic model for fish growth have been examined and the basic features of a logistic growth rate are deeply influenced by the carrying capacity of the system and the changes are periodical with time.
Abstract: Two extensions of
stochastic logistic model for fish growth have been examined. The basic
features of a logistic growth rate are deeply influenced by the carrying
capacity of the system and the changes are periodical with time. Introduction
of a new parameter , enlarges the scope of investing the growthof different fish
species. For rapid growth lying between 1 and 2 and for
slowly growing.
••
TL;DR: Results suggest that sparse Bayesian Multinomial Probit model applied to cancer progression data allows for better subclass prediction and produces more functionally relevant gene sets.
Abstract: A major limitation
of expression profiling is caused by the large number of variables assessed
compared to relatively small sample sizes. In this study, we developed a
multinomial Probit Bayesian model which utilizes the double exponential prior
to induce shrinkage and reduce the number of covariates in the model [1]. A hierarchical Sparse Bayesian Generalized
Linear Model (SBGLM) was developed in order to facilitate Gibbs sampling which
takes into account the progressive nature of the response variable. The method
was evaluated using a published dataset (GSE6099) which contained 99 prostate
cancer cell types in four different progressive stages [2]. Initially, 398 genes were selected using
ordinal logistic regression with a cutoff value of 0.05 after Benjamini and
Hochberg FDR correction. The dataset was randomly divided into training (N = 50)
and test (N = 49) groups such that each group contained equal number of each
cancer subtype. In order to obtain more robust results we performed 50
re-samplings of the training and test groups. Using the top ten genes obtained
from SBGLM, we were able to achieve an average classification accuracy of 85% and
80% in training and test groups, respectively. To functionally evaluate the
model performance, we used a literature mining approach called Geneset Cohesion
Analysis Tool [3]. Examination of the top 100 genes produced
an average functional cohesion p-value of 0.007 compared to 0.047 and 0.131
produced by classical multi-category logistic regression and Random Forest
approaches, respectively. In addition, 96 percent of the SBGLM runs resulted in
a GCAT literature cohesion p-value smaller than 0.047. Taken together, these
results suggest that sparse Bayesian Multinomial Probit model applied to cancer
progression data allows for better subclass prediction and produces more
functionally relevant gene sets.
••
TL;DR: In this paper, the authors give a study on the performance of two specific modifications of the Weibull distribution which are the exponentiated Weibell distribution and the additive Weibullah distribution.
Abstract: Proposed by the Swedish engineer and mathematician Ernst Hjalmar Waloddi Weibull (1887-1979), the Weibull distribution is a probability distribution that is widely used to model lifetime data. Because of its flexibility, some modifications of the Weibull distribution have been made from several researches in order to best adjust the non-monotonic shapes. This paper gives a study on the performance of two specific modifications of the Weibull distribution which are the exponentiated Weibull distribution and the additive Weibull distribution.
••
TL;DR: In this article, a set of variables relative to socio-economic class, urban environment, and travel characteristics was applied in a sample consisting of workers of the S?o Paulo Metropolitan Area, based on the origin-destination home interview survey, carried out in 1997, in order to examine the interdependence between travel patterns and the set of socioeconomic and urban environment variables.
Abstract: The main objective of this study is to analyze work travel-related behavior through a set of variables relative to socio-economic class, urban environment and travel characteristics. The Principal Component Analysis was applied in a sample consisting of workers of the S?o Paulo Metropolitan Area, based on the origin-destination home interview survey, carried out in 1997, in order to: 1) examine the interdependence between travel patterns and a set of socioeconomic and urban environment variables; 2) determine if the original database can be synthetized on components. The results enabled to observe relations between the individual’s socio-economic class and car usage, characteristics of urban environment and destination choices, as well as age and non-motorized travel mode choice. It is then concluded that the database can be adequately summarized in three components for subsequent analysis: 1) urban environment; 2) socio-economic class; and 3) family structure.
••
TL;DR: In this article, the relative strength and rotational robustness of some SWT-based normality tests are investigated for multidimensional normality, including Royston's H-test and the SWT based test proposed by Villase?or-Alva and Gonzalez-Estrada.
Abstract: The Shapiro-Wilk test (SWT) for normality is well known for its competitive power against numerous one-dimensional alternatives. Several extensions of the SWT to multi-dimensions have also been proposed. This paper investigates the relative strength and rotational robustness of some SWT-based normality tests. In particular, the Royston’s H-test and the SWT-based test proposed by Villase?or-Alva and Gonzalez-Estrada have R packages available for testing multivariate normality; thus they are user friendly but lack of rotational robustness compared to the test proposed by Fattorini. Numerical power comparison is provided for illustration along with some practical guidelines on the choice of these SWT-type tests in practice.
••
TL;DR: In this paper, a general framework for large scale modeling of macroeconomic and financial time series is introduced, which is characterized by simplicity of implementation and performs well independently of persistence and heteroskedasticity properties, accounting for common deterministic and============stochastic factors.
Abstract: In the paper, a
general framework for large scale modeling of macroeconomic and financial time
series is introduced. The proposed approach is characterized by simplicity of
implementation, performing well independently of persistence and
heteroskedasticity properties, accounting for common deterministic and
stochastic factors. Monte Carlo results strongly support the proposed
methodology, validating its use also for relatively small cross-sectional and
temporal samples.
••
TL;DR: Performance on the Trail Making Test B did not correlate with pain, fatigue, depression, anxiety, or sensation of rest, and TMT-B cannot be considered fully validated.
Abstract: Introduction:
Cognitive impairment is common in patients with cancer; however, studies examining
the adaptation and validation of instruments for use in patients with cancer
are scarce. Purpose: The purpose of this study was to validate the Trail Making
Test B (TMT-B) for use in patients with cancer. Methods: Ninety-four
outpatients receiving palliative treatment and 39 healthy companions were
assessed. Patients were tested with the TMT-B and answered questions regarding
the presence and intensity of pain, fatigue, quality of sleep, anxiety, and
depression, at two time points with a 7-day inter-assessment interval. Results:
The instrument discriminated between patients, who were slower, and healthy
companions with respect to the time required to complete the test, but not in
terms of the number of errors. The test was stable for the healthy companions
across the two assessments in terms of time to complete the TMT-B and the
number of errors; for patients, the instrument was stable only for the number
of errors. Performance on the TMT-B did not correlate with pain, fatigue,
depression, anxiety, or sensation of rest. Conclusions: TMT-B cannot be
considered fully validated. Further studies incorporating and comparing other
instruments evaluating executive function and mental flexibility are needed.
••
TL;DR: It is shown that many of the common choices in hypothesis testing led to a severely underpowered form of theory evaluation and that confirmatory methods are required in the context of theory Evaluation and that the scientific literature would benefit from a clearer distinction between confirmatory and exploratory findings.
Abstract: Experimental studies are usually designed with specific expectations about the results in mind. However, most researchers apply some form of omnibus test to test for any differences, with follow up tests like pairwise comparisons or simple effects analyses for further investigation of the effects. The power to find full support for the theory with such an exploratory approach which is usually based on multiple testing is, however, rather disappointing. With the simulations in this paper we showed that many of the common choices in hypothesis testing led to a severely underpowered form of theory evaluation. Furthermore, some less commonly used approaches were presented and a comparison of results in terms of power to find support for the theory was made. We concluded that confirmatory methods are required in the context of theory evaluation and that the scientific literature would benefit from a clearer distinction between confirmatory and exploratory findings. Also, we emphasis the importance of reporting all tests, significant or not, including the appropriate sample statistics like means and standard deviations. Another recommendation is related to the fact that researchers, when they discuss the conclusions of their own study, seem to underestimate the role of sampling variability. The execution of more replication studies in combination with proper reporting of all results provides insight in between study variability and the amount of chance findings.