scispace - formally typeset
Search or ask a question

Showing papers in "ETS Research Report Series in 1983"


Journal ArticleDOI
TL;DR: A practical example shows that the bias due to incomplete matching can be severe, and moreover, can be avoided entirely by using an appropriate multivariate nearest available matching algorithm, which, in the example, leaves only a small residual biasDue to inexact matching.
Abstract: Observational studies comparing groups of treated and control units are often used to estimate the effects caused by treatments. Matching is a method for sampling a large reservoir of potential controls to produce a control group of modest size that is ostensibly similar to the treated group. In practice, there is a trade-off between the desires to find matches for all treated units and to obtain matched treated-control pairs that are extremely similar to each other. We derive expressions for the bias in the average matched pair difference due to (1) the failure to match all treated units—incomplete matching, and (2) the failure to obtain exact matches—inexact matching. A practical example shows that the bias due to incomplete matching can be severe, and moreover, can be avoided entirely by using an appropriate multivariate nearest available matching algorithm, which in the example, leaves only a small residual bias due to inexact matching.

283 citations


Journal ArticleDOI
Abstract: A survey of the academic writing skills needed by beginning undergraduate and graduate students was conducted. Faculty, in 190 academic departments at thirty-four U.S. and Canadian universities with high foreign student enrollments completed the questionnaire. At the graduate level, six academic disciplines with relatively high numbers of nonnative students were surveyed: business management (MBA), civil engineering, electrical engineering, psychology, chemistry, and computer science. Undergraduate English departments were chosen to document the skills needed by undergraduate students. The major findings are summarized below. Although writing skill was rated as important to success in graduate training, it was consistently rated as even more important to success after graduation. Even disciplines with relatively light writing requirements (e.g., electrical engineering) reported that some writing is required of first-year students. The writing skills perceived as most important varied across departments. Faculty members reported that, in their evaluations of student writing, they rely more on discourse-level characteristics than on word- or sentence-level characteristics. Discourse-level writing skills of natives and nonnatives were perceived as fairly similar, but significant differences between natives and nonnatives were reported for sentence- and word-level skills and for overall writing. Among the ten writing sample topic types provided, preferred topic types differed across departments. Although some important common elements among the different departments were reported, the survey data distinctly indicate that different disciplines do not uniformly agree on the writing task demands and on a single preferred mode of discourse for evaluating entering undergraduate and graduate students.

100 citations


Journal ArticleDOI
TL;DR: In this article, a review of cognitive theories of problem solving and suggestions made by cognitive psychologists regarding how to teach problem solving are reviewed, and theories and suggestions from creativity research are also considered.
Abstract: Cognitive theories of problem solving and suggestions made by cognitive psychologists regarding how to teach problem solving are reviewed. Theories and suggestions from creativity research are also considered. The results are summarized in a description of how high levels of proficiency in problem solving are acquired and how problem solving skills might best be taught, keeping in mind a distinction between well and ill-structured problems. The need for practice materials is discussed, and some desirable qualities of such materials are suggested. Finally, several unresolved issues regarding instructional methods are considered.

94 citations


Journal ArticleDOI
TL;DR: In this article, a new approach to assessing unexpected differential item performance (item bias or item fairness) is developed and applied to the item responses of males and females to SAT/TSWE items administered operationally in December 1977.
Abstract: A new approach to assessing unexpected differential item performance (item bias or item fairness) is developed and applied to the item responses of males and females to SAT/TSWE items administered operationally in December 1977. While the main body of the report describes the particulars of the present application and delineates the essential features of the approach, a technical appendix describes the standardization approach in detail. The primary goal of the standardization approach is to control for differences in subpopulation ability before making comparisons between subpopulation performance on test items. By so doing, it removes the contaminating effects of ability differences from the assessment of item fairness. Of the total of 195 items studied, the standardization approach identified only a handful as meriting careful review for possible content bias. Of these few, only one item exhibited a clearly unacceptable degree of unexpected differential item performance between males and females that could be attributed to content bias.

93 citations


Journal ArticleDOI
TL;DR: In this article, a systematic review of research on the validity of admissions measures for predicting GPA that reflects performance beyond the freshman year is presented, i.e., longer-term cumulative or independently computed post-freshman-year GPA.
Abstract: The criterion most frequently used in studies designed to assess the predictive validity of measures used in college admission has been the freshman-year GPA. It is not self-evident that the first-year GPA provides either a sufficient or a representative sample of a student's academic performance. Questions have been raised regarding the validity of admissions measures for predicting longer-term performance in college. This is the report of a systematic review of research bearing on (a) the validity of admissions measures for predicting GPA that reflects performance beyond the freshman year—i.e., longer-term cumulative or independently computed post-freshman-year GPA, such as senior-year GPA; (b) the comparative relevance and utility of freshman-year, cumulative, and independently computed post-freshman-year GPA as criteria for the validation of admissions measures. Among other things, the research reviewed lends support to the traditional practice of employing the freshman-year GPA in admissions-related predictive validity studies.

81 citations


Journal ArticleDOI
TL;DR: In this article, a theoretical model for dealing with omitted responses is presented, and two special cases of omitted responses are investigated, one of which is the case of the omitted response in this paper.
Abstract: A theoretical model is given for dealing with omitted responses. Two special cases are investigated.

60 citations


Journal ArticleDOI
TL;DR: This paper traced major trends in the evolution of those aspects of social policies toward children that bear directly on issues of ethnic, racial, and language diversity in our society and examined the public attitudes, the intellectual assumptions, and the sociodemographic trends that have accompanied these policy developments.
Abstract: In every era in the history of U.S. public policies toward children, certain groups of children have been identified as being “at risk” and hence of social concern and responsibility. These groups consist of the physically handicapped and those with serious diseases; the emotionally disturbed; the mentally retarded; orphans; children whose mothers or fathers are permanently or temporarily absent; illegitimate, destitute, indigent, neglected, abused, and anti-social or delinquent children. Only very recently have the children belonging to specified ethnic, racial, and language groups been added as major “risk” categories and thus become a major focus of social concern and public responsibility. The source of the social concern about all the groups listed above is the alleged inability of the family (when it exists) to cope with the problems implied by membership in the group, or the public danger that follows from leaving these problems untreated. This social concern does not mean that American society has always accepted full responsibility for children in these risk categories. It has not always provided care, treatment, or rehabilitation, nor has it always sought to prevent their misfortunes. Since colonial times, however, it has at least shown concern for certain categories of children. My principal purpose in this monograph is to trace major trends in the evolution of those aspects of social policies toward children that bear directly on issues of ethnic, racial, and language diversity in our society. I also seek to examine the public attitudes, the intellectual assumptions, and the sociodemographic trends that have accompanied these policy developments. In addition, I pay some attention to the roles that the social and behavioral sciences have played with regard to such policies.

54 citations



Journal ArticleDOI
TL;DR: In this article, the authors investigated the relationship between parent belief systems regarding children's cognitive capabilities in general and with respect to their own communication handicapped preschool child in particular, and the effect of the CH child's level of functioning and position in the family constellation on parental belief systems.
Abstract: The aim of this research was to investigate parental belief systems and parental childrearing practices, relative to the intellectual development of the communication handicapped (CH) preschool child, embedded within the context of family size and ordinal position of the CH child An equal number of families with a non-handicaped (NCH) child served as a contrast group The basic hypothesis underlying this research was that parental practices would be directly related to parents' beliefs about child development processes and about their own child's cognitive competence These beliefs stem from at least two factors: First, experience as a child in a family; second, experience as a parent The parent-child relationship in this setting was viewed as interactional, where each parent acted as teacher, socializer and manager of the child's behavior In this role, the parent also learned both from the CH child and NCH child in the family Therein lay the interest in investigating the impact of the parent-child relationship on parents and on CH children Specifically, the study addressed four problems: (1) the relationship between parental belief systems regarding children's cognitive capabilities in general and with respect to their own CH child in particular, (2) the effect of the CH child's level of functioning and position in the family constellation on parental belief systems, (3) the relationship between these perspectives and actual parental teaching and management strategies, and (4) the effect of such teaching/management behaviors on the CH child's level of cognitive functioning and level of representational competence Each of these segments was identified and a path (causal) analysis model was developed from data collected with the following instruments: The Communication Strategy Interview, the Construction of the Child Interview, The Family Influences on Childrearing Interview (all of which were used to assess parental beliefs about developmental states and processes and to assess childrearing strategies and family practices); The Parent-Child Interaction Observation System [based on Sigel's (1979) distancing theory of representational competence, this was used to identify different types of parent-child interactions in structured teaching contexts and in semistructured story-telling task]; and a series of standard evaluation tests and nonverbal Piagetian tasks that focused on transformation, memory, sequencing and anticipation to evaluate children's level of cognitive functioning and representational abilities

17 citations


Journal ArticleDOI
TL;DR: Two methods of 'equating' tests using item response theory are compared, one using true scores, the other using estimated observed scores, and they yield almost indistinguishable results.
Abstract: : Two methods of 'equating' tests using item response theory are compared, one using true scores, the other using estimated observed scores. On the data studied, they yield almost indistinguishable results. This is a reassuring result for users of IRT equating methods. (Author)

15 citations


Journal ArticleDOI
TL;DR: The authors found that test content that varies in the degree of relatedness to minorities, one would predict a relative advantage for those outliers that are most related to minorities. But the magnitude of the differences found is not large; perhaps larger differences would be found if classifications other than race and sex, which are the most common, were used.
Abstract: “Outlier studies” identify items for which extreme differences in performance by contrasting groups occur; these extreme items are the “outliers” referred to. Review of the studies conducted on tests receiving major use in higher education reveals that though one cannot make a priori classifications of outliers with confidence, one can with reasonable confidence predict the relatively advantaged group for many verbal items if they subsequently prove to be outliers as follows: aesthetic-philosophical, human relations or female oriented content relatively favors females as opposed to males; practical affairs, science or male oriented content relatively favors males as opposed to females; science content relatively favors whites as opposed to blacks. For test content that varies in the degree of relatedness to minorities, one would predict a relative advantage for those outliers that are most related to minorities. The magnitude of the differences found is not large; perhaps larger differences would be found if classifications other than race and sex, which are the most common, were used. It has been found that differences in cultural or national origin produce larger discrepancies in item difficulty than differences in race or sex of essentially native American groups.

Journal ArticleDOI
TL;DR: This-manual is designed to assist administrators of Englith-as-a-second-language programs in assessing students' language growth by reviewing some of the concepts and terminology to be used and suggesting and illustrating data-recording formats and methods of summarizing raw gains.
Abstract: This-manual is designed to assist administrators of Englith-as-a-second-language programs in assessing students' language growth It begins by reviewing some of the concepts and terminology to be used It then goes on to suggest and illustrate data-recording formats and methods of summarizing raw gains Thit is followed by an eximple based on boWling scores to illustrate the-regression effect An overview of a method for separating raw gain into regression and true gain components follows It concludes with'a brief discussion of a method for comparing two-different groups with differing backgrounds or curricula The appendices give details ofthe data and of the steps in performing the regression analyses using SPSS (Statistical Package for the Social Sciences) (BW)

Journal ArticleDOI
TL;DR: In this paper, a new item type was developed, incorporating features of "ill-structured" problems in a multiple-choice format, similar to previously developed scientific thinking tasks in requiring the examinee to go beyond the information provided; they resemble a variant of the logical reasoning item type, but demand somewhat more structuring by the examineer of the set of assumptions needed to solve the problem.
Abstract: A new item type was developed, incorporating features of “ill-structured” problems in a multiple-choice format. The problems are similar to previously developed scientific thinking tasks in requiring the examinee to go beyond the information provided; they resemble a variant of the logical reasoning item type, but demand somewhat more structuring by the examinee of the set of assumptions needed to solve the problem. A pretest was carried out to compare the new item type with two variants of the logical reasoning item type; a second study, using preexisting data, compared these two variants with one another. In neither case was there any indication that problems identified as “ill-structured” measured different aspects of ability than do “well-structured” problems. The new item type might be employed in tests of reasoning for the sake of increasing the variety of item types available for test construction, but would not be expected to extend the range of cognitive skills assessed by the tests.

Journal ArticleDOI
TL;DR: This article evaluated the construct validity of a videotape-based situational test, the Interpersonal Competence Instrument (ICI), specifically its convergent validity with regard to other measures of social intelligence as well as its discriminant validity with respect to indexes of general ability.
Abstract: The goal of this study was to appraise the construct validity of a videotape-based situational test, the Interpersonal Competence Instrument (ICI), specifically its convergent validity with regard to other measures of social intelligence as well as its discriminant validity with respect to indexes of general ability. The ICI was administered to college students (fraternity and sorority members, and dormitory residents) along with a battery of other devices: self-rating, peer rating, and inventory measures of social intelligence and general ability; objective tests of major components of social intelligence and general ability; and control measures. Five factors emerged in a factor analysis of the measures: general ability, age, sex, halo, and distance to the video receiver. The ICI Judgments–Accuracy score defined the general ability factor and had a borderline loading on the age factor; the ICI Replies–Effectiveness score did not appear on any factor. The Judgments–Accuracy score seems to be tapping general ability to some extent, whereas what the Replies–Effectiveness score is assessing cannot be determined at all.

Journal ArticleDOI
TL;DR: In this article, a statistical method is described and illustrated which provides confidence envelopes around item response functions and examples of 95% confidence envelope for the one-, two-, and three-parameter logistic response models are given.
Abstract: A statistical method is described and illustrated which provides confidence envelopes around item response functions. Examples of 95% confidence envelopes for the one-, two-, and three-parameter logistic response models are given. In addition, we describe, N-line plots, which show the genesis of the envelope as well as the density of lines in the confidence region. These too are illustrated for the one-, two-, and three-parameter logistic models.

Journal ArticleDOI
TL;DR: In this article, the authors consider the question as to whether there are generic problem-solving skills that cut across fields or whether the skills are so embedded within specific fields that they can be identified only within the contexts of those fields.
Abstract: This review considers the question as to whether there are generic problem-solving skills that cut across fields or whether the skills are so embedded within specific fields that they can be identified only within the contexts of those fields. To answer this question, an attempt was made to define both “problems” and their “solution.” Then the evidence for the existence of general problem solving skills that are independent from any specific field was examined. Then the analyses of skills within disciplines were reviewed to see if the skills are common across fields. Finally, the implications of the research for the assessment of problem-solving skills were discussed. In general, it was concluded that similar skills are used in different fields but that their implementation is so dependent on mastery of the specific fields that any assessment of problem-solving skills would best be conducted within the fields.

Journal ArticleDOI
TL;DR: The results of the study support the validity of the TSE as a measure of oral language proficiency and provide some progress toward the setting of appropriate standards of orallanguage proficiency within the health-related professions.
Abstract: The ability to communicate effectively is considered critical to successful performance in numerous occupations, including many health-related professions. Oral proficiency is necessary in some degree for performing such tasks as interacting with medical colleagues and counseling or instructing patients. The importance of communicative competence is evidenced by a growing number of states' and certification agencies' requirements for minimum oral proficiency in a variety of occupations. The increasing interest in oral proficiency during the past decade prompted Educational Testing Service (ETS) to undertake the development of the Test of Spoken English, a standardized test of speaking proficiency of nonnative speakers of English. The test has been validated for the selection of nonnative teaching assistants applying to U.S. institutions. The objectives of the study reported here were to: (a) provide additional validation of the TSE for the selection and certification of nonnative health professionals, (b) establish a set of procedures for determining standards of proficiency on the TSE, and (c) try out these procedures in selected health professions in which the test could be used. Groups of judges who were practitioners in four health professions–nursing, pharmacy, medicine, and veterinary medicine–judged the acceptability of the speaking performance of examinees who had taken the TSE. Each group was asked to provide, for its profession, ratings for three distinct situations, such as communicating with professional colleagues, teaching in the particular field, or working in a hospital. Judges were asked to indicate whether each examinee was at least minimally acceptable to function in each situation. Judges' ratings were then related to the examinees TSE scores. A larger group of consumers of medical care gave more global ratings to the same examinees for each of the four professions. Background data suggested that the groups of judges were heterogeneous with regard to a number of relevant personal characteristics and experiences, such as degree of exposure to nonnative speakers. This diversity suggests that judges represented a variety of points of view about necessary language proficiency. Analyses of judges' ratings revealed a substantial degree of consistency. Even though only relatively small segments of test performance were rated, judges tended to assign the same ratings when asked to rate a second sample of test performance for the same examinees. Although the data showed that the procedures applied can sometimes produce different results from one occasion to the next, the relationships of judges' ratings to TSE scores generally exhibited considerable stability. Alternative methods were applied to determine possible ranges of cutoff scores on the TSE. One of these methods involved a consideration of the consequences of different decision outcomes, e.g., whether the certification or licensing of an unacceptable speaker is more or less serious than failing to certify an acceptable speaker. A range of possible cutoff scores was computed for each situation within each profession. The results of the study support the validity of the TSE as a measure of oral language proficiency and provide some progress toward the setting of appropriate standards of oral language proficiency within the health-related professions.

Journal ArticleDOI
TL;DR: For example, this paper found that foreign ESL examinees are more proficient at processing the discipline-specific content of GRE Subject Tests in their respective fields than in processing the more general verbal content of the GRE Verbal Test.
Abstract: The Graduate Record Examinations (GRE) program at Educational Testing Service (ETS) offers tests of subject-matter achievement (GRE Subject Tests) in 17 fields. During the period between June 1982 and September 1984, more than 19,000 non-U.S. citizens and 78,000 U.S. citizens took one of the following Subject Tests, listed in descending order with respect to “quantitative vs verbal emphasis” in the corresponding fields of study: Engineering, Mathematics, Computer Science, Physics, Chemistry, Economics, Geology, Biology, Education, Psychology, Music, Political Science, Sociology, French, Spanish, History, and Literature in English. Substantial percentages of the Subject Test takers took the GRE General Test on the same testing date. The GRE General Test measures developed verbal (V), quantitative (Q), and analytical (A) abilities. This study was undertaken to provide information regarding the performance of U.S. and non-U.S. citizens on the Subject Tests, and the relationship of selected English-proficiency-related background variables to the test performance of non-U.S. citizens. It was also concerned with exploring the hypothesis that foreign ESL examinees (for whom English is a second language) are likely to be more proficient at processing the discipline-specific content of GRE Subject Tests in their respective fields than in processing the more general verbal content of the GRE Verbal Test. Detailed profiles of U.S. and non-U.S. Subject Test takers were developed to provide comparative information on self-reported relative English proficiency (better communication in English or BCE status vs better communication in some other language) and other background characteristics: sex, age, educational level, undergraduate origin (U.S. vs other), and undergraduate major. Profiles of GRE Subject Test means were developed for U.S. and non-U.S. examinees, generally, and in classifications that introduced some controls for differences in English language background linked to country of origin. Non-U.S. examinees, generally, had higher means than U.S. examinees on Subject Tests in Spanish, French, Music, Psychology, Mathemtics, Computer Science, Chemistry, Physics, and Economics, and slightly lower means in Engineering and Sociology. U.S. citizens had clearly higher means in Geology, Biology, Education, Political Science, History, and Literature. Based on samples of Subject Test/General Test takers, foreign ESL examinees performed better, relative to U.S. examinees, on Subject Tests than on the GRE Verbal Test, supporting the hypothesis that they should be more proficient at processing discipline-specific English language test content than at processing general English language test content. A major implication of the findings is that scores on the GRE Subject Tests appear to be useful for assessing relative levels of subject-matter mastery for examinees differing widely in linguisti-cultural-educational background. Research is needed to determine the extent to which the comparative academic performance of U.S. students and foreign students is consistent with their comparative performance on the GRE Subject Tests.

Journal ArticleDOI
TL;DR: The role of schooling in the development of cognitive abilities and of developed abilities in the processes of school learning is discussed in this paper, as well as the role of cognitive styles as characteristic modes of organization and regulation in information processing which afford unifying self-consistency in learning.
Abstract: As commentary for a special issue on Education and Human Ability of the British journal Educational Analysis, two recurrent themes in the volume are underscored as well as three critical educational issues that were addressed only peripherally there. The two recurrent themes deal with the roles of knowledge and of context in school learning. The three peripheral issues given central focus in the commentary are (1) the role of schooling in the development of cognitive abilities and of developed abilities in the processes of school learning; (2) the role of cognitive styles as characteristic modes of organization and regulation in information processing which afford unifying self-consistency in learning; and, (3) the problem of the match between features of the instruction and functional characteristics of the learner. The commentary thus centers on some of the salient roles in education and learning of knowledge, abilities, context, and style – especially as they merge in the problem of the match.

Journal ArticleDOI
TL;DR: This article analyzed the background data for GRE test takers using log-linear models for multi-way cross-classifications and non-responses to the questions in the GRE background questionnaire.
Abstract: This project has three separate components. The first is an analysis of the background data for GRE test takers using log-linear models for multi-way cross-classifications. The second is the computation of ethnic group GRE verbal and quantitative score means that are standardized for background data. The third is an analysis of the non-responses to the questions in the GRE background questionnaire. The data used in the analyses are from the 1975–76 and 1976–77 testing years. An appendix is included that gives versions of the published tables separately for graduate students and nongraduate students. This report was intended as an internal working paper for use by ETS staff.

Journal ArticleDOI
TL;DR: A context is proposed that not only focuses career exploration in a logical way, beginning with the decision maker and moving to occupations, but also helps organize the process of decision making.
Abstract: A context is proposed that not only focuses career exploration in a logical way, beginning with the decision maker and moving to occupations, but also helps organize the process of decision making. Decision makers are seen as part of the context, for they have values, aptitudes, and.resources that are .relevant to . choices of occupations. For career decisiormakine, occupations are construed along dimensions that are most useful to the.decision maker. These dimeaSions are the ones that, in fhe'occupation, correspond to the values, aptitude's, and resource.dimensions of the decision makers. They are rewards; requisites, and investments. The goal of the decision, then, is the maximiiatiom of values satisfaction within the limits of aptitudes.and resources. Information also requires a-place in the decision-making pr,ocess, since decision makers require two claSses of informationinformation about occupations and information about themselves. Applications of., the context include development of a curriculUm for career decision, making, improvement of occupational information, and improved decision making: '.(YLB)

Journal ArticleDOI
TL;DR: This article examined the relationship between five methods of test preparation and test performance as measured by Graduate Management Admission Test (GMAT) Verbal (V), Quantitative (Q) and Total (T) scores.
Abstract: This study sought to examine the relationship between five methods of test preparation and test performance as measured by Graduate Management Admission Test (GMAT) Verbal (V), Quantitative (Q) and Total (T) scores. Data on method of test preparation were obtained through voluntary examinee response to the following five questions which appeared on the answer sheets: In preparing for this test, did you: Study the sample questions in the GMAT registration bulletin? Work through an actual GMAT published by ETS? Use a book not published by ETS on how to prepare for the GMAT? Attend a test preparation or coaching course for the GMAT? Undertake on your own any review of mathematics? One sample of first-time test takers and one sample of second-time test takers were selected from among the 185,525 1981–82 GMAT examinees who were U.S. citizens. Multiple regressions using GMAT scores as dependent variables and test preparation, undergraduate grade point average (UGPA) and sex as independent variables were computed separately for first-time examinees who were members of the Afro-American/Black, Caucasion/White, Oriental/Asian and Spanish-American U.S. citizen subgroups. Regressions (including first GMAT scores as independent variables) were also computed for all examinees in the sample who were taking the GMAT for the second time. FIRST-TIME EXAMINEES The percents of first-time examinees electing to use each method varied, but the rank order of the frequency of using each method was consistent across the subgroups. The largest proportion of first-time examinees reported that they had prepared by reviewing the bulletin, followed in descending order by using a test preparation book not prepared by ETS, undertaking their own study of mathematics, working through an actual GMAT, and attending a test preparation course. The study also found that examinees electing to use the various methods of preparation did not differ appreciably in previous academic performance as measured by undergraduate grade point average, but did vary slightly in age and amount of work experience. Results of the multiple regression analyses based on data from first-time examinees differed across the four subgroups. The size of the coefficients associated with each method of preparation, as well as the corresponding standard errors, varied among the four subgroups. The same was true for the interaction effects between pairs of methods. The expected difference in verbal score for a “yes” response to “Studying a test review book not published by ETS” when the effects of the other independent variables were held constant, ranged from 1.6 to 3.2 scaled score points and were significant for all four subgroups. The difference in verbal scores for “Studying the Bulletin” ranged from 1.3 scaled score points for Afro-American/Blacks to 4.0 scaled points for Oriental/Asians. The effects of using a review book or taking a review course ranged between .4 and 1.9 points on verbal and quantitative scores. Negative effects were associated with examinees' own review of mathematics. These effects were attributed to a confounding between self-selection and method of preparation. SECOND-TIME EXAMINEES The sizes of the effects associated with each method of preparation for second-time examinees were considerably less than those obtained using data from first-time examinees. When previous test performance was held constant, the effect of using each of the methods of preparation was small. In fact, only the effects of using a test preparation book and of attending a test preparation course were significantly different from zero. The mean GMAT scores between second-time examinees who did and did not use each method of preparation differed inconsistently and only slightly. Additionally, the magnitude of gain over the first administration score was very similar between examinees who used and did not use each method. CONCLUSIONS This study has indicated that differences in GMAT scores do exist among examinees using different methods of preparing for the examination. However, it was shown that when initial ability, as measured by GMAT first score, was controlled, the sizes of the effects of studying the GMAT bulletin, working through an actual GMAT, and reviewing mathematics were not significantly different from zero. The effects of the methods on GMAT scores of first-time examinees, for whom a previous score was not available to use as a covariate, were larger. In those analyses (in which self-reported UGPA was used as a less effective control on ability), the largest effects associated with any method over the other were about 4, 3 and 33 verbal, quantitative and total score points, respectively. However, the effects of using these methods are confounded with the characteristics of examinees who choose to use each method. The effects resulted from a combination of self-selection and preparation. There do appear to be relationships between method of preparation and test scores. However, it must be emphasized that it does not necessarily follow that using any of the methods of preparation causes an increase in scores.

Journal ArticleDOI
Neil J. Dorans1
TL;DR: The analytical decomposition demonstrates how the effects of item properties, test properties, individual examinee responses and rounding rules combine to produce the item deletion effect on the equating/scaling function and candidate scores.
Abstract: The purpose of this report is to present a formal analysis of the effects of item deletion on equating/scaling functions and reported score distributions. The phrase “item deletion” shall be used to refer to the process of changing the original key of a flawed item to either all options correct, including omits, or to no options correct, i.e., not scoring the flawed item. There are two aspects to the present analysis. The first aspect is analytical, focusing on the development of a formal model for the item deletion effect by decomposing it into its constituent elements. The second component of the analysis is empirical, involving the use of actual data to illustrate and supplement the analytical results. The analytical decomposition demonstrates how the effects of item properties, test properties, individual examinee responses and rounding rules combine to produce the item deletion effect on the equating/scaling function and candidate scores. In addition to demonstrating how the deleted item's psychometric properties can affect the equating function, the analytical component of the report examines the effects of not scoring vs. scoring all options correct and the effects of re-equating vs. not re-equating, as well as the interaction between the decision to re-equate or to not re-equate and the scoring option chosen for the flawed item. The empirical portion of the report uses data from the May 1982 administration of the SAT, which contained the circles item, to illustrate the effects of item deletion on reported score distributions and equating functions. The empirical data verify what the analytical decomposition predicts.

Journal ArticleDOI
TL;DR: The authors found that children who broadly relied upon anticipation of meanings, as their entry into text, exhibited reading strategies that were not well suited to constraints of comprehension items, whereas the children whose styles of learning were more linear and more narrowly focused exhibited strategies that are better matched to the task.
Abstract: Follow-up data consisting of teacher ratings and reading achievement scores (CAT) were collected for twenty children whose early reading progress had been extensively documented in previous observational research. The children in the Follow-up sample represented two contrasting stylistic groups with distinctive strategies for reading. The data indicated no difference between the groups in general reading ability by the end of the primary grades. There was evidence, however, of stylistic influences on children's responses to measures (subtests) of reading comprehension. Children who broadly relied upon anticipation of meanings, as their entry into text, exhibited reading strategies that were not well suited to constraints of comprehension items, whereas the children whose styles of learning were more linear and more narrowly focused exhibited strategies that were better matched to the task.

Journal ArticleDOI
TL;DR: The Descriptive Tests of Language Skills (dtls) as mentioned in this paper are a set of tests of reading comprehension, logical relationships, vocabulary, usage, and sentence structure developed by the College Board.
Abstract: The Descriptive Tests of Language Skills (dtls)—comprising tests of reading comprehension, logical relationships, vocabulary, usage, and sentence structure developed by the College Board—are designed to help colleges assess the language skills of entering students for placement and instructional planning purposes. A complete description of the dtls and of field trials of the tests may be found in the Guide to the Use of the Descriptive Tests of Language Skills (College Entrance Examination Board, 1978). Performance data on the dtls for entering college students were collected through a pilot study of 4,234 students in 16 colleges (see Table 1).

Journal ArticleDOI
TL;DR: In this paper, the authors identify generic skills in interpersonal relations and examine the implications of research for attempts to assess these skills using a developmental framework, using three areas of research were reviewed: the social development of children and adolescents, clinical studies of interpersonal competence, and studies of effective leadership.
Abstract: This review was intended to identify generic skills in interpersonal relations and to examine the implications of research for attempts to assess these skills. Using a developmental framework, three areas of research were reviewed: the social development of children and adolescents, clinical studies of interpersonal competence, and studies of effective leadership. Six skills appeared in this literature: use of basic social forms, common interactions with others, constructive assertiveness, internal monitoring, emotional expression, and the coordination of group activities.

Journal ArticleDOI
TL;DR: This article found that the effect of coaching on GRE verbal ability scores was not significantly related to the amount of coaching that examinees received, but rather the length, the cost, and the type of programs offered.
Abstract: The controversy over the effectiveness of coaching for standardized admission tests has been fueled by studies that have sought dichotomous answers to questions that are more appropriately posed as ones of degree. Instead of asking categorically “Does coaching work?“, researchers would seem better advised to ask “How much time and effort, devoted to what kinds of coaching experiences, produce how much improvement in test performance?” To explore some relational answers, information on test preparation activities was collected from a large representative sample of candidates who took the GRE Aptitude Test in June 1980. About 3 percent of these candidates indicated that they had attended formal coaching programs for one or more sections of the test and provided information on the length, cost, and offerer of the courses. After adjusting for differences in the background characteristics of coached and uncoached students, effects on test scores were related to the length, the cost, and the type of programs offered. The effects on GRE verbal ability scores were not significantly related to the amount of coaching that examinees received. Quantitative coaching effects appeared to increase only slightly with time, but the relationship was not statistically significant. Effects on analytical ability scores, on the other hand, were related significantly to the length of coaching programs, through improved performance on two analytical item types, analysis of explanations and logical diagrams. These item types were shown previously to be susceptible to improvement through formal instructional intervention and have since been deleted from the test. Test performance was related to the kinds of coaching programs examinees attended only for the quantitative section of the test. With respect to the possibility that some kinds of test takers might profit more than others from coaching, exploratory analyses suggested that examinees intending to pursue higher-level degrees may have benefited more than lower-level degree seekers. This finding is consistent with the conjecture that highly motivated test takers may achieve greater effects from coaching than less motivated examinees. Overall, the data suggest that, when compared with the two highly susceptible item types that have been removed from the GRE Aptitude Test, the test item types in the current version of the test (now called the GRE General Test) appear to show relatively little susceptibility to formal coaching experiences of the kinds considered here.

Journal ArticleDOI
TL;DR: In this article, the authors examined the statistical and institutional influences on the prediction of first-year college grades, using data from College Board validity studies and the College Handbook, and investigated the characteristics that were associated with the greater or lesser efficiency of the predictors (SAT Verbal and Mathematical and high school grades).
Abstract: This study examined the statistical and institutional influences on the prediction of first-year college grades, using data from College Board validity studies and the College Handbook. The criterion was the size of the multiple correlation between academic predictors and first-year college grades. The independent variables were the statistical data of the validity study and college characteristics. In general, the extent of the variation of the academic ability of the students was positively related to the size of the multiple correlation, and the heterogeneity of the programs and experience of college negatively related. Further analyses investigated the characteristics that were associated with the greater or lesser efficiency of the predictors (SAT Verbal and Mathematical and high school grades.)

Journal ArticleDOI
TL;DR: The authors identified generic competencies in communication that may be amenable to assessment, such as listening, empathy, non-verbal communication, and expressive abilities, focusing on research-based literature.
Abstract: This review attempts to identify generic competencies in communication that may be amenable to assessment. Concentrating on research based literature, the review examines the models, methods, and research reviews that have been used to study communication. From these sources four areas of skill were identified: listening, empathy, non-verbal communication, and expressive abilities. The research relating to the definitions of subskills within these areas is discussed, as well as the difficulties in assessing the skills.

Journal ArticleDOI
TL;DR: In this article, the authors discuss some of the methodological problems inherent in existing methods for determining disparities in special education placement and offer an alternative to presently-used methods, which can be used for the simultaneous comparison of disparity across many districts and several diagnostic categories.
Abstract: The discovery of disproportionate minority enrollments in special education classes has been a subject of considerable concern to federal, state, and local education officials alike for the past decade. Because the existence of disproportions is frequently thought to result from discriminatory placement practices on the part of local school districts, many state agencies are implementing methods for detecting and taking action against offending districts. This paper discusses some of the methodological problems inherent in existing methods for determining disparities in special education placement and offers an alternative to presently-used methods. The alternative offered can be used for the simultaneous comparison of disparity across many districts and several diagnostic categories.