scispace - formally typeset
Search or ask a question
Book•

Statistics: The Art and Science of Learning from Data

TL;DR: In this paper, the authors present a survey of the art and science of learning from data, focusing on the use of data to answer statistical questions and their application in the field of data analysis.
Abstract: Part 1: Gathering and Exploring Data 1. Statistics: The Art and Science of Learning from Data 1.1 Using Data to Answer Statistical Questions 1.2 Sample Versus Population 1.3 Using Calculators and Computers Chapter Summary Chapter Problems 2. Exploring Data with Graphs and Numerical Summaries 2.1 Different Types of Data 2.2 Graphical Summaries of Data 2.3 Measuring the Center of Quantitative Data 2.4 Measuring the Variability of Quantitative Data 2.5 Using Measures of Position to Describe Variability 2.6 Recognizing and Avoiding Misuses of Graphical Summaries Chapter Summary Chapter Problems 3. Association: Contingency, Correlation, and Regression 3.1 The Association Between Two Categorical Variables 3.2 The Association Between Two Quantitative Variables 3.3 Predicting the Outcome of a Variable 3.4 Cautions in Analyzing Associations Chapter Summary Chapter Problems 4. Gathering Data 4.1 Experimental and Observational Studies 4.2 Good and Poor Ways to Sample 4.3 Good and Poor Ways to Experiment 4.4 Other Ways to Conduct Experimental and Nonexperimental Studies Chapter Summary Chapter Problems Part 1 Review Part 1 Questions Part 1 Exercises Part 2: Probability, Probability Distributions, and Sampling Distributions 5. Probability in Our Daily Lives 5.1 How Probability Quantifies Randomness 5.2 Finding Probabilities 5.3 Conditional Probability: The Probability of A Given B 5.4 Applying the Probability Rules Chapter Summary Chapter Problems 6. Probability Distributions 6.1 Summarizing Possible Outcomes and Their Probabilities 6.2 Probabilities for Bell-Shaped Distributions 6.3 Probabilities When Each Observation Has Two Possible Outcomes Chapter Summary Chapter Problems 7. Sampling Distributions 7.1 How Sample Proportions Vary Around the Population Proportion 7.2 How Sample Means Vary Around the Population Mean 7.3 The Binomial Distribution Is a Sampling Distribution (Optional) Chapter Summary Chapter Problems Part 2 Review Part 2 Questions Part 2 Exercises Part 3: Inferential Statistics 8. Statistical Inference: Confidence Intervals 8.1 Point and Interval Estimates of Population Parameters 8.2 Constructing a Confidence Interval to Estimate a Population Proportion 8.3 Constructing a Confidence Interval to Estimate a Population Mean 8.4 Choosing the Sample Size for a Study 8.5 Using Computers to Make New Estimation Methods Possible Chapter Summary Chapter Problems 9. Statistical Inference: Significance Tests about Hypotheses 9.1 Steps for Performing a Significance Test 9.2 Significance Tests about Proportions 9.3 Significance Tests about Means 9.4 Decisions and Types of Errors in Significance Tests 9.5 Limitations of Significance Tests 9.6 The Likelihood of a Type II Error (Not Rejecting H0, Even Though It's False) Chapter Summary Chapter Problems 10. Comparing Two Groups 10.1 Categorical Response: Comparing Two Proportions 10.2 Quantitative Response: Comparing Two Means 10.3 Other Ways of Comparing Means and Comparing Proportions 10.4 Analyzing Dependent Samples 10.5 Adjusting for the Effects of Other Variables Chapter Summary Chapter Problems Part 3 Review Part 3 Questions Part 3 Exercises Part 4: Analyzing Association and Extended Statistical Methods 11. Analyzing the Association Between Categorical Variables 11.1 Independence and Association 11.2 Testing Categorical Variables for Independence 11.3 Determining the Strength of the Association 11.4 Using Residuals to Reveal the Pattern of Association 11.5 Small Sample Sizes: Fisher's Exact Test Chapter Summary Chapter Problems 12. Analyzing the Association Between Quantitative Variables: Regression Analysis 12.1 Model How Two Variables Are Related 12.2 Describe Strength of Association 12.3 Make Inference About the Association 12.4How the Data Vary Around the Regression Line 12.5 Exponential Regression: A Model for Nonlinearity Chapter Summary Chapter Problems 13. Multiple Regression 13.1 Using Several Variables to Predict a Response 13.2 Extending the Correlation and R-squared for Multiple Regression 13.3 Using Multiple Regression to Make Inferences 13.4 Checking a Regression Model Using Residual Plots 13.5 Regression and Categorical Predictors 13.6 Modeling a Categorical Response Chapter Summary Chapter Problems 14. Comparing Groups: Analysis of Variance Methods 14.1 One-Way ANOVA: Comparing Several Means 14.2 Estimating Differences in Groups for a Single Factor 14.3 Two-Way ANOVA Chapter Summary Chapter Problems 15. Nonparametric Statistics 15.1 Compare Two Groups by Ranking 15.2 Nonparametric Methods For Several Groups and for Matched Pairs Chapter Summary Chapter Problems PART 4 Review Part 4 Questions Part 4 Exercises Tables Answers Index Index of Applications Photo Credits
Citations
More filters
Journal Article•DOI•
TL;DR: In this article, the authors measure brain activity during motor sequencing and characterize network properties based on coherent activity between brain regions, and they find that the complex reconfiguration patterns of the brain's putative functional modules that control learning can be described parsimoniously by the combined presence of a relatively stiff temporal core that is composed primarily of sensorimotor and visual regions whose connectivity changes little in time and a flexible temporal periphery, whereas connectivity changes frequently.
Abstract: As a person learns a new skill, distinct synapses, brain regions, and circuits are engaged and change over time. In this paper, we develop methods to examine patterns of correlated activity across a large set of brain regions. Our goal is to identify properties that enable robust learning of a motor skill. We measure brain activity during motor sequencing and characterize network properties based on coherent activity between brain regions. Using recently developed algorithms to detect time-evolving communities, we find that the complex reconfiguration patterns of the brain's putative functional modules that control learning can be described parsimoniously by the combined presence of a relatively stiff temporal core that is composed primarily of sensorimotor and visual regions whose connectivity changes little in time and a flexible temporal periphery that is composed primarily of multimodal association regions whose connectivity changes frequently. The separation between temporal core and periphery changes over the course of training and, importantly, is a good predictor of individual differences in learning success. The core of dynamically stiff regions exhibits dense connectivity, which is consistent with notions of core-periphery organization established previously in social networks. Our results demonstrate that core-periphery organization provides an insightful way to understand how putative functional modules are linked. This, in turn, enables the prediction of fundamental human capacities, including the production of complex goal-directed behavior.

265 citations

Journal Article•DOI•
TL;DR: In this article, the impact of this research has started to change course content and structure, in both course content, structure and course content in both the academic and non-academic domains.
Abstract: Over the past few decades there has been a large amount of research dedicated to the teaching of statistics. The impact of this research has started to change course content and structure, in both ...

156 citations

Journal Article•DOI•
TL;DR: The introductory statistics course has traditionally targeted consumers of statistics with the intent of producing a citizenry capable of a critical analysis of basic published statistics, but this approach is predicated on providing data that the students see as real and relevant.
Abstract: Statistics and the Modern Student Robert Gould University of California, Los Angeles rgould@stat.ucla.edu Summary The introductory statistics course has traditionally targeted consumers of statistics with the intent of producing a citizenry capable of a critical analysis of basic published statistics. More recently, statistics educators have attempted to center the intro course on real data, in part to motivate students and in part to create a more relevant course. The success of this approach is predicated on providing data that the students see as real and relevant. Modern students, however, have a different view of data than did students of 10 or even 5 years ago. Modern statistics courses must adjust to the fact that students' first exposure to data occurs outside the academy. Les cours d'initiation a la statistique ont traditionellement vise les consomammateurs de la statistique avec l'intention de produire une population capable de faire une analyse critique des statistiques elementaires publiees. Plus recemment, les professeurs de la statistique ont tente d'orienter les cours d'initiation vers des donnees reelles, afin de motiver les eleves d'un part, et de creer un cours plus pertinent d'autre part. Le succes de cette approche repose sur une provision de donnees que les etudiants considerent comme reels et pertinents. Cependent, les etudiants modernes ont une vision des donnees qui est differente de celle qu'ont eu les eleves d'il y a 10 ou meme 5 ans. Les cours modernes de statistique doivent s'adapter au fait que la premiere rencontre des eleves aux donnees a lieu en dehors de l'academie. Key words: Education; data technology; statistical literacy; technological literacy; data. The time may not be very remote when it will be understood that for a complete initiation as an efficient citizen of one of the new great complex world wide states that are now developing, it is as necessary to be able to compute, to think in averages and maxima and minima, as it is now to be able to read and write. --H.G. Wells (1903, pg 204) 1.0 History and Background of Intro Statistics Perhaps because Wells was best known for his science fiction, the above quote, written in 1904, is often attributed an oracular quality. But Francis Galton and Florence Nightingale were alive in 1904, and so perhaps the quote is better understood as a reflection of the excitement that grew as many realized that statistics was no longer just for astronomers, but useful in the social sciences, medical science, biology, criminology,

136 citations

Journal Article•DOI•
TL;DR: In this article, the handling of outliers in the context of independent samples t tests applied to nonnormal sum scores is discussed, and it is shown that removing outliers based on commonly used Z value thresholds severely increases the Type I error rate.
Abstract: In psychology, outliers are often excluded before running an independent samples t test, and data are often nonnormal because of the use of sum scores based on tests and questionnaires. This article concerns the handling of outliers in the context of independent samples t tests applied to nonnormal sum scores. After reviewing common practice, we present results of simulations of artificial and actual psychological data, which show that the removal of outliers based on commonly used Z value thresholds severely increases the Type I error rate. We found Type I error rates of above 20% after removing outliers with a threshold value of Z = 2 in a short and difficult test. Inflations of Type I error rates are particularly severe when researchers are given the freedom to alter threshold values of Z after having seen the effects thereof on outcomes. We recommend the use of nonparametric Mann-Whitney-Wilcoxon tests or robust Yuen-Welch tests without removing outliers. These alternatives to independent samples t tests are found to have nominal Type I error rates with a minimal loss of power when no outliers are present in the data and to have nominal Type I error rates and good power when outliers are present.

106 citations

Journal Article•DOI•
Hui Zhou1, Aihong Meng1, Yanqiu Long1, Qinghai Li1, Yanguo Zhang1 •
TL;DR: The chemical characteristics analysis of municipal solid waste showed that polyethylene (PE), polypropylene (PP), and polystyrene (PS) had the highest volatile matter content, with almost no ash and fixed carbon, while polyethylenes terephthalate (PET) had high carbon content but low hydrogen content.
Abstract: Municipal solid waste (MSW) has been normally sorted into six categories, namely, food residue, wood waste, paper, textiles, plastics, and rubber. In each category, materials could be classified fu...

86 citations