scispace - formally typeset
Search or ask a question

Showing papers on "Sample size determination published in 2022"


Journal ArticleDOI
TL;DR: In this article , the authors used three of the largest neuroimaging datasets currently available, with a total sample size of around 50,000 individuals, to quantify brain-wide association studies effect sizes and reproducibility as a function of sample size.
Abstract: Magnetic resonance imaging (MRI) has transformed our understanding of the human brain through well-replicated mapping of abilities to specific structures (for example, lesion studies) and functions1-3 (for example, task functional MRI (fMRI)). Mental health research and care have yet to realize similar advances from MRI. A primary challenge has been replicating associations between inter-individual differences in brain structure or function and complex cognitive or mental health phenotypes (brain-wide association studies (BWAS)). Such BWAS have typically relied on sample sizes appropriate for classical brain mapping4 (the median neuroimaging study sample size is about 25), but potentially too small for capturing reproducible brain-behavioural phenotype associations5,6. Here we used three of the largest neuroimaging datasets currently available-with a total sample size of around 50,000 individuals-to quantify BWAS effect sizes and reproducibility as a function of sample size. BWAS associations were smaller than previously thought, resulting in statistically underpowered studies, inflated effect sizes and replication failures at typical sample sizes. As sample sizes grew into the thousands, replication rates began to improve and effect size inflation decreased. More robust BWAS effects were detected for functional MRI (versus structural), cognitive tests (versus mental health questionnaires) and multivariate methods (versus univariate). Smaller than expected brain-phenotype associations and variability across population subsamples can explain widespread BWAS replication failures. In contrast to non-BWAS approaches with larger effects (for example, lesions, interventions and within-person), BWAS reproducibility requires samples with thousands of individuals.

611 citations


Journal ArticleDOI
TL;DR: In this paper , the authors used three of the largest neuroimaging datasets currently available, with a total sample size of around 50,000 individuals, to quantify brain-wide association studies effect sizes and reproducibility as a function of sample size.
Abstract: Magnetic resonance imaging (MRI) has transformed our understanding of the human brain through well-replicated mapping of abilities to specific structures (for example, lesion studies) and functions1-3 (for example, task functional MRI (fMRI)). Mental health research and care have yet to realize similar advances from MRI. A primary challenge has been replicating associations between inter-individual differences in brain structure or function and complex cognitive or mental health phenotypes (brain-wide association studies (BWAS)). Such BWAS have typically relied on sample sizes appropriate for classical brain mapping4 (the median neuroimaging study sample size is about 25), but potentially too small for capturing reproducible brain-behavioural phenotype associations5,6. Here we used three of the largest neuroimaging datasets currently available-with a total sample size of around 50,000 individuals-to quantify BWAS effect sizes and reproducibility as a function of sample size. BWAS associations were smaller than previously thought, resulting in statistically underpowered studies, inflated effect sizes and replication failures at typical sample sizes. As sample sizes grew into the thousands, replication rates began to improve and effect size inflation decreased. More robust BWAS effects were detected for functional MRI (versus structural), cognitive tests (versus mental health questionnaires) and multivariate methods (versus univariate). Smaller than expected brain-phenotype associations and variability across population subsamples can explain widespread BWAS replication failures. In contrast to non-BWAS approaches with larger effects (for example, lesions, interventions and within-person), BWAS reproducibility requires samples with thousands of individuals.

520 citations


Journal ArticleDOI
TL;DR: This article conducted a systematic review of four databases to identify studies empirically assessing sample sizes for saturation in qualitative research, supplemented by searching citing articles and reference lists, and identified 23 articles that used empirical data or statistical modeling to assess saturation.
Abstract: To review empirical studies that assess saturation in qualitative research in order to identify sample sizes for saturation, strategies used to assess saturation, and guidance we can draw from these studies.We conducted a systematic review of four databases to identify studies empirically assessing sample sizes for saturation in qualitative research, supplemented by searching citing articles and reference lists.We identified 23 articles that used empirical data (n = 17) or statistical modeling (n = 6) to assess saturation. Studies using empirical data reached saturation within a narrow range of interviews (9-17) or focus group discussions (4-8), particularly those with relatively homogenous study populations and narrowly defined objectives. Most studies had a relatively homogenous study population and assessed code saturation; the few outliers (e.g., multi-country research, meta-themes, "code meaning" saturation) needed larger samples for saturation.Despite varied research topics and approaches to assessing saturation, studies converged on a relatively consistent sample size for saturation for commonly used qualitative research methods. However, these findings apply to certain types of studies (e.g., those with homogenous study populations). These results provide strong empirical guidance on effective sample sizes for qualitative research, which can be used in conjunction with the characteristics of individual studies to estimate an appropriate sample size prior to data collection. This synthesis also provides an important resource for researchers, academic journals, journal reviewers, ethical review boards, and funding agencies to facilitate greater transparency in justifying and reporting sample sizes in qualitative research. Future empirical research is needed to explore how various parameters affect sample sizes for saturation.

404 citations


Journal ArticleDOI
TL;DR: In this paper , the authors compare the performance of 14 differential abundance testing methods on 38 16S rRNA gene datasets with two sample groups and test for differences in amplicon sequence variants and operational taxonomic units (ASVs) between these groups.
Abstract: Identifying differentially abundant microbes is a common goal of microbiome studies. Multiple methods are used interchangeably for this purpose in the literature. Yet, there are few large-scale studies systematically exploring the appropriateness of using these tools interchangeably, and the scale and significance of the differences between them. Here, we compare the performance of 14 differential abundance testing methods on 38 16S rRNA gene datasets with two sample groups. We test for differences in amplicon sequence variants and operational taxonomic units (ASVs) between these groups. Our findings confirm that these tools identified drastically different numbers and sets of significant ASVs, and that results depend on data pre-processing. For many tools the number of features identified correlate with aspects of the data, such as sample size, sequencing depth, and effect size of community differences. ALDEx2 and ANCOM-II produce the most consistent results across studies and agree best with the intersect of results from different approaches. Nevertheless, we recommend that researchers should use a consensus approach based on multiple differential abundance methods to help ensure robust biological interpretations.

185 citations


Journal ArticleDOI
TL;DR: In this paper , the authors compare the performance of 14 differential abundance testing methods on 38 16S rRNA gene datasets with two sample groups and test for differences in amplicon sequence variants and operational taxonomic units (ASVs) between these groups.
Abstract: Identifying differentially abundant microbes is a common goal of microbiome studies. Multiple methods are used interchangeably for this purpose in the literature. Yet, there are few large-scale studies systematically exploring the appropriateness of using these tools interchangeably, and the scale and significance of the differences between them. Here, we compare the performance of 14 differential abundance testing methods on 38 16S rRNA gene datasets with two sample groups. We test for differences in amplicon sequence variants and operational taxonomic units (ASVs) between these groups. Our findings confirm that these tools identified drastically different numbers and sets of significant ASVs, and that results depend on data pre-processing. For many tools the number of features identified correlate with aspects of the data, such as sample size, sequencing depth, and effect size of community differences. ALDEx2 and ANCOM-II produce the most consistent results across studies and agree best with the intersect of results from different approaches. Nevertheless, we recommend that researchers should use a consensus approach based on multiple differential abundance methods to help ensure robust biological interpretations.

166 citations


Journal ArticleDOI
01 Jan 2022
TL;DR: In this paper , six approaches are discussed to justify the sample size in a quantitative empirical study: collecting data from (almost) the entire population, choosing a sample size based on resource constraints, performing an a-priori power analysis, planning for a desired accuracy, using heuristics, or explicitly acknowledging the absence of a justification.
Abstract: An important step when designing an empirical study is to justify the sample size that will be collected. The key aim of a sample size justification for such studies is to explain how the collected data is expected to provide valuable information given the inferential goals of the researcher. In this overview article six approaches are discussed to justify the sample size in a quantitative empirical study: 1) collecting data from (almost) the entire population, 2) choosing a sample size based on resource constraints, 3) performing an a-priori power analysis, 4) planning for a desired accuracy, 5) using heuristics, or 6) explicitly acknowledging the absence of a justification. An important question to consider when justifying sample sizes is which effect sizes are deemed interesting, and the extent to which the data that is collected informs inferences about these effect sizes. Depending on the sample size justification chosen, researchers could consider 1) what the smallest effect size of interest is, 2) which minimal effect size will be statistically significant, 3) which effect sizes they expect (and what they base these expectations on), 4) which effect sizes would be rejected based on a confidence interval around the effect size, 5) which ranges of effects a study has sufficient power to detect based on a sensitivity power analysis, and 6) which effect sizes are expected in a specific research area. Researchers can use the guidelines presented in this article, for example by using the interactive form in the accompanying online Shiny app, to improve their sample size justification, and hopefully, align the informational value of a study with their inferential goals.

159 citations


Journal ArticleDOI
TL;DR: In this article , the authors compare the effect of nudge interventions in academic journals and Nudge Units in the United States and conclude that selective publication in the Academic Journals sample explains about 70 percent of the difference in effect sizes between the two samples.
Abstract: Nudge interventions have quickly expanded from academic studies to larger implementation in so‐called Nudge Units in governments. This provides an opportunity to compare interventions in research studies, versus at scale. We assemble a unique data set of 126 RCTs covering 23 million individuals, including all trials run by two of the largest Nudge Units in the United States. We compare these trials to a sample of nudge trials in academic journals from two recent meta‐analyses. In the Academic Journals papers, the average impact of a nudge is very large—an 8.7 percentage point take‐up effect, which is a 33.4% increase over the average control. In the Nudge Units sample, the average impact is still sizable and highly statistically significant, but smaller at 1.4 percentage points, an 8.0% increase. We document three dimensions which can account for the difference between these two estimates: (i) statistical power of the trials; (ii) characteristics of the interventions, such as topic area and behavioral channel; and (iii) selective publication. A meta‐analysis model incorporating these dimensions indicates that selective publication in the Academic Journals sample, exacerbated by low statistical power, explains about 70 percent of the difference in effect sizes between the two samples. Different nudge characteristics account for most of the residual difference.

76 citations


Journal ArticleDOI
TL;DR: This paper found that the median I2 was 96.9% (IQR 90.5-98.7%) in a sample of 134 meta-analyses of prevalence, with higher number of studies and extreme pooled estimates (defined as <10% or ≥90%).
Abstract: Over the last decade, there has been a 10‐fold increase in the number of published systematic reviews of prevalence. In meta‐analyses of prevalence, the summary estimate represents an average prevalence from included studies. This estimate is truly informative only if there is no substantial heterogeneity among the different contexts being pooled. In systematic reviews, heterogeneity is usually explored with I‐squared statistic (I2), but this statistic does not directly inform us about the distribution of effects and frequently systematic reviewers and readers misinterpret this result. In a sample of 134 meta‐analyses of prevalence, the median I2 was 96.9% (IQR 90.5–98.7). We observed larger I2 in meta‐analysis with higher number of studies and extreme pooled estimates (defined as <10% or >90%). Studies with high I2 values were more likely to have conducted a sensitivity analysis, including subgroup analysis but only three (2%) systematic reviews reported prediction intervals. We observed that meta‐analyses of prevalence often present high I2 values. However, the number of studies included in the meta‐analysis and the point estimate can be associated with the I2 value, and a high I2 value is not always synonymous with high heterogeneity. In meta‐analyses of prevalence, I2 statistics may not be discriminative and should be interpreted with caution, avoiding arbitrary thresholds. To discuss heterogeneity, reviewers should focus on the description of the expected range of estimates, which can be done using prediction intervals and planned sensitivity analysis.

61 citations


Journal ArticleDOI
01 May 2022-Neuron
TL;DR: This paper showed that cross-sectional brain-behavior correlations are often small and unreliable without large samples and pushed human neuroscience toward study designs that either maximize sample sizes to detect small effects or maximize effect sizes using focused investigations.

57 citations


Journal ArticleDOI
TL;DR: In this paper , the authors compared the statistical power of discrete (k-means), fuzzy (fuzzy) and finite mixture modelling approaches (which include latent class analysis and latent profile analysis).
Abstract: Cluster algorithms are gaining in popularity in biomedical research due to their compelling ability to identify discrete subgroups in data, and their increasing accessibility in mainstream software. While guidelines exist for algorithm selection and outcome evaluation, there are no firmly established ways of computing a priori statistical power for cluster analysis. Here, we estimated power and classification accuracy for common analysis pipelines through simulation. We systematically varied subgroup size, number, separation (effect size), and covariance structure. We then subjected generated datasets to dimensionality reduction approaches (none, multi-dimensional scaling, or uniform manifold approximation and projection) and cluster algorithms (k-means, agglomerative hierarchical clustering with Ward or average linkage and Euclidean or cosine distance, HDBSCAN). Finally, we directly compared the statistical power of discrete (k-means), "fuzzy" (c-means), and finite mixture modelling approaches (which include latent class analysis and latent profile analysis).We found that clustering outcomes were driven by large effect sizes or the accumulation of many smaller effects across features, and were mostly unaffected by differences in covariance structure. Sufficient statistical power was achieved with relatively small samples (N = 20 per subgroup), provided cluster separation is large (Δ = 4). Finally, we demonstrated that fuzzy clustering can provide a more parsimonious and powerful alternative for identifying separable multivariate normal distributions, particularly those with slightly lower centroid separation (Δ = 3).Traditional intuitions about statistical power only partially apply to cluster analysis: increasing the number of participants above a sufficient sample size did not improve power, but effect size was crucial. Notably, for the popular dimensionality reduction and clustering algorithms tested here, power was only satisfactory for relatively large effect sizes (clear separation between subgroups). Fuzzy clustering provided higher power in multivariate normal distributions. Overall, we recommend that researchers (1) only apply cluster analysis when large subgroup separation is expected, (2) aim for sample sizes of N = 20 to N = 30 per expected subgroup, (3) use multi-dimensional scaling to improve cluster separation, and (4) use fuzzy clustering or mixture modelling approaches that are more powerful and more parsimonious with partially overlapping multivariate normal distributions.

50 citations


Journal ArticleDOI
TL;DR: In this article , the influence of Problem-Based Learning (PBL) on the effectiveness of instructional intervention for Critical Thinking (CT) in higher education has been analyzed by conducting a meta-analysis by synthesizing 50 relevant empirical studies from 2000 to 2021.

Journal ArticleDOI
TL;DR: In this paper , the roughness heterogeneity of rock joint surface roughness is characterized based on a statistical analysis of all samples extracted from different locations of a given rock joint, and the results show that the expected value obtained from conventional methods failed to accurately represent the overall roughness.
Abstract: Abstract Rock joint surface roughness is usually characterized by heterogeneity, but the determination of a required number of samples for achieving a reasonable heterogeneity assessment remains a challenge. In this paper, a novel method, the global search method, was proposed to investigate the heterogeneity of rock joint roughness. In this method, the roughness heterogeneity was characterized based on a statistical analysis of the roughness of all samples extracted from different locations of a given rock joint. Analyses of the effective sample number were conducted, which showed that sampling bias was caused by an inadequate number of samples. To overcome this drawback, a large natural slate joint sample (1000 mm × 1000 mm in size) was digitized in a laboratory using a high-accuracy laser scanner. The roughness heterogeneities of both two-dimensional (2D) profiles and three-dimensional (3D) surface topographies were systematically investigated. The results show that the expected value obtained from conventional methods failed to accurately represent the overall roughness. The relative errors between the population parameter and the expected value varied not only from sample to sample but also with the scale. The roughness heterogeneity characteristics of joint samples of various sizes can be obtained using the global search method. This new method could facilitate the determination of the most representative samples and their positions.

Journal ArticleDOI
TL;DR: In this paper , a meta-analytic study quantifies globally reported gaming disorder (GD) prevalence rates and explores their various moderating variables, including region, sample size, year of data collection, age group, study design, sampling method, survey format, sample type, risk of bias, terminology, assessment tool, and male proportion.

Journal ArticleDOI
TL;DR: A systematic review on the application of Machine Learning (ML) in thermal comfort studies to highlight the latest methods and findings and provide an agenda for future studies is provided in this article .

Journal ArticleDOI
20 Jan 2022-PeerJ
TL;DR: This article showed that having few random effects levels does not strongly influence the parameter estimates or uncertainty around those estimates for fixed effects terms, at least in the case presented here, and that the coverage probability of fixed effects estimates is sample size dependent.
Abstract: As linear mixed-effects models (LMMs) have become a widespread tool in ecology, the need to guide the use of such tools is increasingly important. One common guideline is that one needs at least five levels of the grouping variable associated with a random effect. Having so few levels makes the estimation of the variance of random effects terms (such as ecological sites, individuals, or populations) difficult, but it need not muddy one’s ability to estimate fixed effects terms—which are often of primary interest in ecology. Here, I simulate datasets and fit simple models to show that having few random effects levels does not strongly influence the parameter estimates or uncertainty around those estimates for fixed effects terms—at least in the case presented here. Instead, the coverage probability of fixed effects estimates is sample size dependent. LMMs including low-level random effects terms may come at the expense of increased singular fits, but this did not appear to influence coverage probability or RMSE, except in low sample size (N = 30) scenarios. Thus, it may be acceptable to use fewer than five levels of random effects if one is not interested in making inferences about the random effects terms (i.e. when they are ‘nuisance’ parameters used to group non-independent data), but further work is needed to explore alternative scenarios. Given the widespread accessibility of LMMs in ecology and evolution, future simulation studies and further assessments of these statistical methods are necessary to understand the consequences both of violating and of routinely following simple guidelines.

Journal ArticleDOI
TL;DR: Performance metrics used for evaluating age prediction models depend on cohort and study‐specific data characteristics, and cannot be directly compared across different studies.
Abstract: Estimating age based on neuroimaging‐derived data has become a popular approach to developing markers for brain integrity and health. While a variety of machine‐learning algorithms can provide accurate predictions of age based on brain characteristics, there is significant variation in model accuracy reported across studies. We predicted age in two population‐based datasets, and assessed the effects of age range, sample size and age‐bias correction on the model performance metrics Pearson's correlation coefficient (r), the coefficient of determination (R2), Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). The results showed that these metrics vary considerably depending on cohort age range; r and R2 values are lower when measured in samples with a narrower age range. RMSE and MAE are also lower in samples with a narrower age range due to smaller errors/brain age delta values when predictions are closer to the mean age of the group. Across subsets with different age ranges, performance metrics improve with increasing sample size. Performance metrics further vary depending on prediction variance as well as mean age difference between training and test sets, and age‐bias corrected metrics indicate high accuracy—also for models showing poor initial performance. In conclusion, performance metrics used for evaluating age prediction models depend on cohort and study‐specific data characteristics, and cannot be directly compared across different studies. Since age‐bias corrected metrics generally indicate high accuracy, even for poorly performing models, inspection of uncorrected model results provides important information about underlying model attributes such as prediction variance.

Journal ArticleDOI
TL;DR: In this article , a 2 × 2 framework based on sampling goal and methodology for screening and evaluating the quality of online samples is proposed, where screeners can be categorized as direct, which screens individual responses; and as statistical, which provides quantitative signals of low quality.

Journal ArticleDOI
TL;DR: The UK Biobank imaging sub-sample as discussed by the authors showed consistent statistically significant 'healthy' bias compared with the full cohort, and the effect sizes were small to moderate based on Cohen's d/Cramer's V metrics (range = 0.02 to 0.21 for Townsend, the largest effect size).
Abstract: UK Biobank is a prospective cohort study of around half-a-million general population participants, recruited between 2006 and 2010, with baseline studies at recruitment and multiple assessments since. From 2014 to date, magnetic resonance imaging (MRI) has been pursued in a participant sub-sample, with the aim to scan around n = 100k. This sub-sample is studied widely and therefore understanding its relative characteristics is important for future reports. We aimed to quantify psychological and physical health in the UK Biobank imaging sub-sample, compared with the rest of the cohort. We used t-tests and χ2 for continuous/categorical variables, respectively, to estimate average differences on a range of cognitive, mental and physical health phenotypes. We contrasted baseline values of participants who attended imaging (versus had not), and compared their values at the imaging visit versus baseline values of participants who were not scanned. We also tested the hypothesis that the associations of established risk factors with worse cognition would be underestimated in the (hypothesized) healthier imaging group compared with the full cohort. We tested these interactions using linear regression models. On a range of cognitive, mental health, cardiometabolic, inflammatory and neurological phenotypes, we found that 47 920 participants who were scanned by January 2021 showed consistent statistically significant 'healthy' bias compared with the ∼450 000 who were not scanned. These effect sizes were small to moderate based on Cohen's d/Cramer's V metrics (range = 0.02 to -0.21 for Townsend, the largest effect size). We found evidence of interaction, where stratified analysis demonstrated that associations of established cognitive risk factors were smaller in the imaging sub-sample compared with the full cohort. Of the ∼100 000 participants who ultimately will undergo MRI assessment within UK Biobank, the first ∼50 000 showed some 'healthy' bias on a range of metrics at baseline. Those differences largely remained at the subsequent (first) imaging visit, and we provide evidence that testing associations in the imaging sub-sample alone could lead to potential underestimation of exposure/outcome estimates.

Journal ArticleDOI
TL;DR: In this article , the authors investigate the crucial role of trials in task-based neuroimaging from the perspectives of statistical efficiency and condition-level generalizability, and find that increasing both the number of trials and subjects improves statistical efficiency.

Journal ArticleDOI
TL;DR: In this paper , the authors present and discuss four parameters (namely level of confidence, precision, variability of the data, and anticipated loss) required for sample size calculation for prevalence studies.
Abstract: Abstract Background Although books and articles guiding the methods of sample size calculation for prevalence studies are available, we aim to guide, assist and report sample size calculation using the present calculators. Results We present and discuss four parameters (namely level of confidence, precision, variability of the data, and anticipated loss) required for sample size calculation for prevalence studies. Choosing correct parameters with proper understanding, and reporting issues are mainly discussed. We demonstrate the use of a purposely-designed calculators that assist users to make proper informed-decision and prepare appropriate report. Conclusion Two calculators can be used with free software (Spreadsheet and RStudio) that benefit researchers with limited resources. It will, hopefully, minimize the errors in parameter selection, calculation, and reporting. The calculators are available at: ( https://sites.google.com/view/sr-ln/ssc ).

Journal ArticleDOI
TL;DR: In this article , the effect of immersive virtual reality (I-VR) technology on learning has become necessary with the decreasing cost of virtual reality technologies and the development of high-quality head-mounted displays.

Journal ArticleDOI
TL;DR: This paper proposed a summary-statistics-based power analysis for mixed-effects modeling with two-level nested data (for both binary and continuous predictors), complementing the existing formula-based and simulation-based methods.
Abstract: This article proposes a summary-statistics-based power analysis-a practical method for conducting power analysis for mixed-effects modeling with two-level nested data (for both binary and continuous predictors), complementing the existing formula-based and simulation-based methods. The proposed method bases its logic on conditional equivalence of the summary-statistics approach and mixed-effects modeling, paring back the power analysis for mixed-effects modeling to that for a simpler statistical analysis (e.g., one-sample t test). Accordingly, the proposed method allows us to conduct power analysis for mixed-effects modeling using popular software such as G*Power or the pwr package in R and, with minimum input from relevant prior work (e.g., t value). We provide analytic proof and a series of statistical simulations to show the validity and robustness of the summary-statistics-based power analysis and show illustrative examples with real published work. We also developed a web app (https://koumurayama.shinyapps.io/summary_statistics_based_power/) to facilitate the utility of the proposed method. While the proposed method has limited flexibilities compared with the existing methods in terms of the models and designs that can be appropriately handled, it provides a convenient alternative for applied researchers when there is limited information to conduct power analysis. (PsycInfo Database Record (c) 2022 APA, all rights reserved).

Journal ArticleDOI
TL;DR: A unified MR approach, MR-APSS, is proposed, which accounts for pleiotropy and sample structure simultaneously by leveraging genome-wide information and allows to include more genetic variants with moderate effects as instrument variables (IVs) to improve statistical power without inflating type I errors.
Abstract: Mendelian Randomization (MR) is a valuable tool for inferring causal relationships among a wide range of traits using summary statistics from genome-wide association studies (GWASs). Existing summary-level MR methods often rely on strong assumptions, resulting in many false positive findings. To relax MR assumptions, ongoing research has been primarily focused on accounting for confounding due to pleiotropy. Here we show that sample structure is another major confounding factor, including population stratification, cryptic relatedness, and sample overlap. We propose a unified MR approach, MR-APSS, which (i) accounts for pleiotropy and sample structure simultaneously by leveraging genome-wide information; and (ii) allows to include more genetic variants with moderate effects as instrument variables (IVs) to improve statistical power without inflating type I errors. We first evaluated MR-APSS using comprehensive simulations and negative controls, and then applied MR-APSS to study the causal relationships among a collection of diverse complex traits. The results suggest that MR-APSS can better identify plausible causal relationships with high reliability. In particular, MR-APSS can perform well for highly polygenic traits, where the IV strengths tend to be relatively weak and existing summary-level MR methods for causal inference are vulnerable to confounding effects.

Journal ArticleDOI
TL;DR: The importance of closer collaboration between data scientists and medical laboratory professionals in order to correctly characterise the relevant population, select the most appropriate statistical and analytical methods, ensure reproducibility, enable the proper interpretation of the results, and gain actual utility by using machine learning methods in clinical practice is discussed.
Abstract: Abstract The current gold standard for COVID-19 diagnosis, the rRT-PCR test, is hampered by long turnaround times, probable reagent shortages, high false-negative rates and high prices. As a result, machine learning (ML) methods have recently piqued interest, particularly when applied to digital imagery (X-rays and CT scans). In this review, the literature on ML-based diagnostic and prognostic studies grounded on hematochemical parameters has been considered. By doing so, a gap in the current literature was addressed concerning the application of machine learning to laboratory medicine. Sixty-eight articles have been included that were extracted from the Scopus and PubMed indexes. These studies were marked by a great deal of heterogeneity in terms of the examined laboratory test and clinical parameters, sample size, reference populations, ML algorithms, and validation approaches. The majority of research was found to be hampered by reporting and replicability issues: only four of the surveyed studies provided complete information on analytic procedures (units of measure, analyzing equipment), while 29 provided no information at all. Only 16 studies included independent external validation. In light of these findings, we discuss the importance of closer collaboration between data scientists and medical laboratory professionals in order to correctly characterise the relevant population, select the most appropriate statistical and analytical methods, ensure reproducibility, enable the proper interpretation of the results, and gain actual utility by using machine learning methods in clinical practice.

Journal ArticleDOI
TL;DR: This article examined the relationship between teachers' self-efficacy and attitudes toward inclusive education of K-12 students with special educational needs and identified potential moderators (publication, sample, and research procedure characteristics).

Journal ArticleDOI
TL;DR: This article examined the relationship between teachers' self-efficacy and attitudes toward inclusive education of K-12 students with special educational needs and identified potential moderators (publication, sample, and research procedure characteristics).

Journal ArticleDOI
TL;DR: The only way to protect ourselves from p-hacking would be to publish a statistical plan before experiments are initiated, describing the outcomes of interest and the corresponding statistical analyses to be performed, and to use multiple diversity metrics as an outcome measure.
Abstract: Background Since sequencing techniques have become less expensive, larger sample sizes are applicable for microbiota studies. The aim of this study is to show how, and to what extent, different diversity metrics and different compositions of the microbiota influence the needed sample size to observe dissimilar groups. Empirical 16S rRNA amplicon sequence data obtained from animal experiments, observational human data, and simulated data were used to perform retrospective power calculations. A wide variation of alpha diversity and beta diversity metrics were used to compare the different microbiota datasets and the effect on the sample size. Results Our data showed that beta diversity metrics are the most sensitive to observe differences as compared with alpha diversity metrics. The structure of the data influenced which alpha metrics are the most sensitive. Regarding beta diversity, the Bray–Curtis metric is in general the most sensitive to observe differences between groups, resulting in lower sample size and potential publication bias. Conclusion We recommend performing power calculations and to use multiple diversity metrics as an outcome measure. To improve microbiota studies, awareness needs to be raised on the sensitivity and bias for microbiota research outcomes created by the used metrics rather than biological differences. We have seen that different alpha and beta diversity metrics lead to different study power: because of this, one could be naturally tempted to try all possible metrics until one or more are found that give a statistically significant test result, i.e., p-value < α. This way of proceeding is one of the many forms of the so-called p-value hacking. To this end, in our opinion, the only way to protect ourselves from (the temptation of) p-hacking would be to publish a statistical plan before experiments are initiated, describing the outcomes of interest and the corresponding statistical analyses to be performed.

Posted ContentDOI
23 Jul 2022-bioRxiv
TL;DR: It is demonstrated that the joint reliability of both biological and clinical/cognitive phenotypic measurements must be optimized in order to ensure biomarkers are reproducible and accurate, and that the prioritization of reliable phenotyping will revolutionize neurobiological and clinical endeavors that are focused on brain and behavior.
Abstract: Biomarkers of behavior and psychiatric illness for cognitive and clinical neuroscience remain out of reach1–4. Suboptimal reliability of biological measurements, such as functional magnetic resonance imaging (fMRI), is increasingly cited as a primary culprit for discouragingly large sample size requirements and poor reproducibility of brain-based biomarker discovery1,5–7. In response, steps are being taken towards optimizing MRI reliability and increasing sample sizes8–11, though this will not be enough. Optimizing biological measurement reliability and increasing sample sizes are necessary but insufficient steps for biomarker discovery; this focus has overlooked the ‘other side of the equation’ - the reliability of clinical and cognitive assessments - which are often suboptimal or unassessed. Through a combination of simulation analysis and empirical studies using neuroimaging data, we demonstrate that the joint reliability of both biological and clinical/cognitive phenotypic measurements must be optimized in order to ensure biomarkers are reproducible and accurate. Even with best-case scenario high reliability neuroimaging measurements and large sample sizes, we show that suboptimal reliability of phenotypic data (i.e., clinical diagnosis, behavioral and cognitive measurements) will continue to impede meaningful biomarker discovery for the field. Improving reliability through development of novel assessments of phenotypic variation is needed, but it is not the sole solution. We emphasize the potential to improve the reliability of established phenotypic methods through aggregation across multiple raters and/or measurements12–15, which is becoming increasingly feasible with recent innovations in data acquisition (e.g., web- and smart-phone-based administration, ecological momentary assessment, burst sampling, wearable devices, multimodal recordings)16–20. We demonstrate that such aggregation can achieve better biomarker discovery for a fraction of the cost engendered by large-scale samples. Although the current study has been motivated by ongoing developments in neuroimaging, the prioritization of reliable phenotyping will revolutionize neurobiological and clinical endeavors that are focused on brain and behavior.

Journal ArticleDOI
TL;DR: Wastewater-based surveillance of AMR appears promising, with high overall concordance between wastewater and human AMR prevalence estimates in studies irrespective of heterogenous approaches as mentioned in this paper .

Journal ArticleDOI
TL;DR: Many RCTs for COVID-19 had a low fragility index, challenging confidence in the robustness of the results, and a median of 4 events was required to change the analysis findings from statistically significant to not significant.
Abstract: Key Points Question In randomized clinical trials (RCTs) of COVID-19 that report statistically significant results, what is the fragility index, ie, the minimum number of participants who would need to have had a different outcome for the RCT to lose statistical significance? Findings In this cross-sectional study of 47 RCTs with a total of 138 235 participants that had statistically significant results, the median fragility index was 4. That is, a median of 4 events was required to change the analysis findings from statistically significant to not significant. Meaning In this study, many RCTs for COVID-19 had a low fragility index, challenging confidence in the robustness of the results.