scispace - formally typeset
Search or ask a question

Showing papers in "ETS Research Report Series in 2015"


Journal ArticleDOI
TL;DR: A comprehensive review of the test conditions that have been shown to influence motivation is provided in this article, where the authors identify a number of circumstances under which motivation can be enhanced or diminished.
Abstract: There is a growing concern that when scores from low-stakes assessments are reported without considering student motivation as a construct of interest, biased conclusions about how much students know will result. Low motivation is a problem particularly relevant to low-stakes testing scenarios, which may be low stakes for the test taker but have considerable consequences for teachers, school districts, or educational and governmental institutions. The current review addresses the impact of motivation on assessment scores and research that have identified methods for minimizing the error introduced by unmotivated test takers. A comprehensive review of the test conditions that have been shown to influence motivation is provided. In addition, the review identifies a number of circumstances under which motivation can be enhanced or diminished. The benefits and limitations of the various measurement techniques that have been used to mitigate the negative impact of low-motivation test takers on score interpretation are discussed.

71 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a comprehensive review of existing frameworks, definitions, and assessments of civic-related constructs from approximately 30 projects relevant to higher education, and include a discussion of the challenges related to assessment design and implementation.
Abstract: Civic learning is increasingly recognized as important by the higher education and workforce communities. The development of high-quality assessments that can be used to evaluate students' civic learning during the college years has become a priority. This paper presents a comprehensive review of existing frameworks, definitions, and assessments of civic-related constructs from approximately 30 projects relevant to higher education, and includes a discussion of the challenges related to assessment design and implementation. Synthesizing information from the review, we propose an assessment framework to guide the design of a next-generation assessment of individuals' civic learning that takes advantage of recent advances in assessment methods. The definition identifies 2 key domains within civic learning: civic competency and civic engagement. Civic competency encompasses 3 areas (civic knowledge; analytic skills; and participatory and involvement skills), and civic engagement also captures 3 areas (motivations, attitudes, and efficacy; democratic norms and values; and participation and activities). We discuss item formats and task types that would ensure fair and reliable scoring for the assessment. The review of definitions of civic learning and its components developed by organizations, the proposed assessment framework, and assessment considerations presented here have potential benefits for a range of higher education institutions. This includes institutions that currently have students engaged in relevant curricular or cocurricular activities and also institutions that would find assessments of civic competency and engagement helpful in program development or in evaluating students' accomplishments.

66 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a framework intended to link the following assessment development concepts into a systematic framework: evidence-centered design (ECD), scenario-based assessment (SBA), and assessment of, for, and as learning.
Abstract: This paper presents a framework intended to link the following assessment development concepts into a systematic framework: evidence-centered design (ECD), scenario-based assessment (SBA), and assessment of, for, and as learning. The context within which we develop this framework is the English language arts (ELA) for K-12 students, though the framework could easily be applied to cover reading, writing, and critical thinking skills from pre-K through college. Central to the framework is the concept of a key practice, drawn from constructivist learning theory, which emphasizes the purposeful social context within which skills are recruited and organized to carry out complex literacy tasks. We argue that key practices provide a key link between existing CBAL™ ELA learning progressions (defined as part of a student model for literacy skills) and the structure of well-designed SBAs. This structure enables us to design assessments that model a key practice, supporting the systematic creation of task sequences that can be used to support both instruction and assessment.

37 citations



Journal ArticleDOI
TL;DR: In this paper, the authors provide a detailed overview of previous (empirical) research, theories, and frameworks of communicative competence to review the role of pragmatics as an essential component of L2 communicative language ability.
Abstract: This review paper constitutes the first step within a larger research effort to develop an interactive pragmatics learning tool for second and foreign language (L2) learners and users of English. The tool will primarily endeavor to support pragmatics learning within the language use domain “workplace.” Given this superordinate objective, this paper is subdivided into 2 parts. In the first section, we provide a detailed overview of previous (empirical) research, theories, and frameworks of communicative competence to review the role of pragmatics as an essential component of L2 communicative language ability. A principled, systematic, and exhaustive literature search was conducted via key word searches, and the selected literature was categorized and coded using NVivo 10 software. Next, 12 distinct models of communicative language ability that contain components of pragmatic knowledge were identified and analyzed. The commonly identified constitutive components were then reconceptualized into a proposed construct of pragmatic competence. The challenges of operationalizing pragmatic competence in both instruction and assessment are discussed. The second part of the paper constitutes a domain analysis of pragmatics in the language use domain “workplace.” First, the literature is reviewed for communicative tasks and activities that feature prominently in different workplace settings across various English-speaking countries. Then, we suggest and exemplify different model task types that can be employed in the context of learning and assessment materials that aim to foster pragmatic-functional awareness in both English as a foreign language (EFL)/English as a second language (ESL) learners and first language (L1) speakers alike.

29 citations



Journal ArticleDOI
TL;DR: The authors examines key factors contributing to young Hispanic dual language learners academically at-risk status, as well as the emerging research base on strategies for supporting the learning and development of DLLs in preschool and the early primary grades.
Abstract: Dual language learners, or DLLs, may have greater school readiness needs due to the key role English oral language skills play in the development of emerging literacy skills in English and their overall academic achievement. This especially can be the case if children's capacity to benefit from classroom instruction and interact with teachers and fellow students is dependent on their English language proficiency. This policy report examines key factors contributing to young Hispanic DLLs academically at-risk status, as well as the emerging research base on strategies for supporting the learning and development of DLLs in preschool and the early primary grades. Also addressed are the practical, on-the-ground implementation challenges to be addressed if early education programs are to incorporate these strategies.

28 citations


Journal Article
TL;DR: The authors provide a comprehensive literature review on the development of key argumentation skills to lay a foundation for a framework of the key practice, discuss and debate ideas, which is centrally involved in the expectations for academic reading and writing.
Abstract: In this paper, we provide a comprehensive literature review on the development of key argumentation skills to lay a foundation for a framework of the key practice, discuss and debate ideas, which is centrally involved in the expectations for academic reading and writing. Specifically, the framework includes 5 phases of core activities and related sets of argumentation skills, and for each set of skills, a provisional learning progression is designed to identify qualitative shifts in the development of critical argumentation skills informed by the developmental literature. These learning may have the potential to support teachers' instructional decisions that effectively scaffold their students to the next level.

27 citations


Journal ArticleDOI
TL;DR: The results of the study indicate that keystroke log features vary considerably in stability across testing occasions and display somewhat different patterns of feature–human correlation across genres and topics.
Abstract: In this report, we examine the feasibility of characterizing writing performance using process features derived from a keystroke log. Using data derived from a set of CBAL™ writing assessments, we examine the following research questions: (a) How stable are the keystroke timing and process features across testing occasions? (b) How consistent are the patterns of feature–human correlation across genres and topics? (c) How accurately can we predict human ratings on writing fundamentals using a combination of the keystroke timing and process features, and what are the contributions of each feature to the reliable variance in the human ratings? (d) If we train a predictive model on one prompt, how well do its predictions generalize to the other prompts of the same or different genre? The results of the study indicate that keystroke log features vary considerably in stability across testing occasions and display somewhat different patterns of feature–human correlation across genres and topics. However, using the most stable features, we can obtain moderate to strong prediction of human essay scores, and those models generalize reasonably well across prompts though more strongly within than across writing genres.

27 citations


Journal ArticleDOI
TL;DR: A large sample collected from middle school students in the United States is used to investigate the factor structure of the writing process features gathered from keystroke logs and the association of that latent structure with the quality of the final product.
Abstract: In educational measurement contexts, essays have been evaluated and formative feedback has been given based on the end product. In this study, we used a large sample collected from middle school students in the United States to investigate the factor structure of the writing process features gathered from keystroke logs and the association of that latent structure with the quality of the final product (i.e., the essay text). The extent to which those process factors had incremental value over product features was also examined. We extracted 29 process features using the keystroke logging engine developed at Educational Testing Service (ETS). We identified 4 factors that represent the extent of writing fluency, local word-level editing, phrasal/chunk-level editing, and planning and deliberation during writing. We found that 2 of the 4 factors—writing fluency, and planning and deliberation—significantly related to the quality of the final text, whereas the 4 factors altogether accounted for limited variance in human scores. In 1 of the 2 samples studied, the keystroke-logging fluency factor added incrementally, but only marginally, to the prediction of human ratings of text-production skills beyond product features. The limited power of the writing process features for predicting human scores and the lack of clear additional predictive value over product features are not surprising given that the human raters have no knowledge of the writing process leading to the final text and that the product features measure the basic text quality specified in the human scoring rubric. Study limitations and recommendation for future research are also provided.

24 citations


Journal ArticleDOI
TL;DR: The TOEFL Junior test was developed to address the increasing need for objective measures of English language proficiency for young adolescent learners, who are being introduced to English as a second or foreign language at a much younger age than ever before as discussed by the authors.
Abstract: This paper presents the theoretical and empirical foundations of the TOEFL Junior® assessment and its development process. The TOEFL Junior test was developed to address the increasing need for objective measures of English language proficiency for young adolescent learners, who are being introduced to English as a second or foreign language at a much younger age than ever before. This paper presents the test purposes and intended uses, target population, target language use domains, and test constructs of the TOEFL Junior test. Also included is a description of the overall test structure and scoring system, which demonstrates how the constructs are operationalized. Finally, we outline research topics to support the interpretive argument of the use of the test. This document is expected to serve as a reference point during investigations of validity evidence to support the intended test uses over time.

Journal Article
TL;DR: The authors used a large sample collected from middle school students in the United States to investigate the factor structure of the writing process features gathered from keystroke logs and the association of that latent structure with the quality of the final product (i.e., the essay text).
Abstract: In educational measurement contexts, essays have been evaluated and formative feedback has been given based on the end product. In this study, we used a large sample collected from middle school students in the United States to investigate the factor structure of the writing process features gathered from keystroke logs and the association of that latent structure with the quality of the final product (i.e., the essay text). The extent to which those process factors had incremental value over product features was also examined. We extracted 29 process features using the keystroke logging engine developed at Educational Testing Service (ETS). We identified 4 factors that represent the extent of writing fluency, local word-level editing, phrasal/chunk-level editing, and planning and deliberation during writing. We found that 2 of the 4 factors—writing fluency, and planning and deliberation—significantly related to the quality of the final text, whereas the 4 factors altogether accounted for limited variance in human scores. In 1 of the 2 samples studied, the keystroke-logging fluency factor added incrementally, but only marginally, to the prediction of human ratings of text-production skills beyond product features. The limited power of the writing process features for predicting human scores and the lack of clear additional predictive value over product features are not surprising given that the human raters have no knowledge of the writing process leading to the final text and that the product features measure the basic text quality specified in the human scoring rubric. Study limitations and recommendation for future research are also provided.

Journal ArticleDOI
TL;DR: This article developed a model of conducting research and inquiry as a key literacy practice in the English language arts (ELA) and identified a set of activities and skills that are critical for participating in research; each skill is accompanied by a setof provisional learning progressions, which outlines tentative predictions about the qualitative changes in a skill that develop over time with appropriate instruction.
Abstract: Current educational standards call for students to engage in the skills of research and inquiry, with a focus on gathering evidence from multiple information sources, evaluating the credibility of those sources, and writing an integrated synthesis that cites evidence from those sources. Opportunities to build strong research skills are critical, yet empirical research demonstrates that students from Grades K–16 struggle with inquiry tasks, particularly in online environments. There is a need to create models that will support teachers in developing students' research skills and can be used to develop reliable and valid assessments of such skills while aligning with standards. Under the CBAL™ research initiative, we have developed a model of conducting research and inquiry as a key literacy practice in the English language arts (ELA). In this paper, we draw on literature from the cognitive and learning sciences—including work in discourse processing, science education, educational technology, and information literacy—to provide the theoretical background for this key practice. We identify a set of activities and skills that are critical for participating in research; each skill is accompanied by a set of provisional learning progressions, which outlines tentative predictions about the qualitative changes in a skill that develop over time with appropriate instruction. These learning progressions and their relation to the key practice can be leveraged in the design of cognitively based assessments of research and inquiry that are sensitive to students' developmental level. We conclude, with an example design for such an assessment to illustrate how key practices and learning progressions can be integrated to support measurement of research and inquiry skills.

Journal ArticleDOI
TL;DR: The authors provide a comprehensive literature review on the development of key argumentation skills to lay a foundation for a framework of the key practice, discuss and debate ideas, which is centrally involved in the expectations for academic reading and writing.
Abstract: In this paper, we provide a comprehensive literature review on the development of key argumentation skills to lay a foundation for a framework of the key practice, discuss and debate ideas, which is centrally involved in the expectations for academic reading and writing. Specifically, the framework includes 5 phases of core activities and related sets of argumentation skills, and for each set of skills, a provisional learning progression is designed to identify qualitative shifts in the development of critical argumentation skills informed by the developmental literature. These learning may have the potential to support teachers' instructional decisions that effectively scaffold their students to the next level.

Journal ArticleDOI
TL;DR: This paper found that an earnings penalty is associated with bilingualism, and that people who are bilingual often make less than monolingual in similar jobs than those who are non-bilingual.
Abstract: Although it is commonly thought that people who are bilingual have an advantage in the labor market, studies on this topic have not borne out this perception. The literature, in fact, has found an earnings penalty is associated with bilingualism—people who are bilingual often make less than people who are monolingual in similar jobs. This report reviews those studies and introduces a new set of studies that found different outcomes for bilingual people in terms of education and earnings. In this report I examine why the prior and present studies differ so greatly and what this means for education policy.

Journal ArticleDOI
TL;DR: Building and sharing knowledge is a foundational literacy activity that enables students to learn and communicate what they read in texts as discussed by the authors, which is a strategic process that involves the integration of five key components or phases.
Abstract: In this paper we provide the rationale and foundation for the building and sharing knowledge key practice for the CBAL™ English language arts competency model. Building and sharing knowledge is a foundational literacy activity that enables students to learn and communicate what they read in texts. It is a strategic process that involves the integration of five key components or phases. Before reading, students activate their relevant background knowledge to help set learning goals, identify relevant information, and ask guiding questions that set the context for learning. During reading, students understand the text by using a host of strategies to construct a coherent mental model of the text content that is consistent with their background knowledge. Students clarify meanings of unknown words and concepts as they engage in metacognitive and self-regulated learning. After reading, students consolidate what they have read by using a variety of reading strategies that strengthen the representation in long-term memory. Finally, students convey what they have read in writing, speaking, or other representational formats to reflect communication goals and the intended audience. Collectively, the building and sharing knowledge key practice is intended to both model skilled performance and help identify component skill weakness. In this paper we outline the major features of the key practice as well as address potential advantages and challenges of the approach.

Journal ArticleDOI
TL;DR: In this article, the authors describe the initial automated scoring results that were obtained using the constructed responses from the writing and speaking sections of the pilot forms of the TOEFL Junior Comprehensive test administered in late 2011.
Abstract: This report describes the initial automated scoring results that were obtained using the constructed responses from the Writing and Speaking sections of the pilot forms of the TOEFL Junior® Comprehensive test administered in late 2011. For all of the items except one (the edit item in the Writing section), existing automated scoring capabilities were used with only minor modifications to obtain a baseline benchmark for automated scoring performance on the TOEFL Junior task types; for the edit item in the Writing section, a new automated scoring capability based on string matching was developed. A generic scoring model from the e-rater® automated essay scoring engine was used to score the email, opinion, and listen-write items in the Writing section, and the form-level results based on the five responses in the Writing section from each test taker showed a human–machine correlation of r = .83 (compared to a human–human correlation of r = .90). For scoring the Speaking section, new automated speech recognition models were first trained, and then item-specific scoring models were built for the read-aloud picture narration, and listen-speak items using preexisting features from the SpeechRaterSM automated speech scoring engine (with the addition of a new content feature for the listen-speak items). The form-level results based on the five items in the Speaking section from each test taker showed a human–machine correlation of r = .81 (compared to a human–human correlation of r = .89).

Journal ArticleDOI
TL;DR: The authors used hierarchical linear models to model growth in test performance as a function of the time interval between test administrations and found a positive, statistically significant relationship; that is, test takers with longer intervals between retesting exhibited greater gains than did test testers who retested at shorter intervals.
Abstract: Standardized tests are often designed to provide only a snapshot of test takers' knowledge, skills, or abilities at a single point in time. Sometimes, however, they are expected to serve more demanding functions, one of them is assessing change in knowledge, skills, or ability over time because of learning effects. The latter is the case for the newly developed TOEFL Junior® Standard test, which measures improvement in young learners' proficiency in English as a foreign language. In this study, we used nonexperimental-repeated measures data from approximately 4,600 students from multiple countries to examine the extent to which observed patterns in within-individual changes in test scores were consistent with changes in underlying language proficiency because of learning. Because most students were actively participating in English language learning programs, the time interval between test administrations, which varied among students, served as a proxy for the extent of English language learning opportunities. We used hierarchical linear models to model growth in test performance as a function of the time interval between test administrations and found a positive, statistically significant relationship; that is, test takers with longer intervals between retesting exhibited greater gains than did test takers who retested at shorter intervals. The estimated relationship for the total score corresponded to between .16 and .24 test standard deviations of growth per year, depending on model specification. The findings are robust to sensitivity analyses that explore potential biasing factors. Overall, the findings are consistent with the hypothesis that the TOEFL Junior Standard test is capable of reflecting change in English language proficiency over time.


Journal ArticleDOI
Abstract: The Study of Teaching and Learning in Accelerated Nursing Degree Programs explores how nurse educators are adapting their teaching practices for accelerated, second-degree nursing program students. To provide findings on topics including instructional practices and the roles and attitudes of faculty, a web survey was administered to almost 100 staff members from schools of nursing that received grant and scholarship funds through the Robert Wood Johnson Foundation's New Careers in Nursing program for accelerated nursing students. The study revealed that nursing school faculty have positive perceptions of working with accelerated nursing students and that instructional approaches do not differ much between traditional and nursing students. At the same time, the factors described as most predictive of accelerated nursing student success were noncognitive attributes such as motivation and commitment to the nursing profession; prior degrees in the science or health fields were not necessarily seen as predictive of the success of second-degree students working toward an accelerated bachelor's or master's degree in nursing.


Journal ArticleDOI
Khaled Barkaoui1
TL;DR: In this article, the authors describe the writing activities that test takers engage in when responding to the writing tasks in the TOEFL iBT® test and examine the effects of task type and test-taker English language proficiency (ELP) and keyboarding skills on the frequency and distribution of these activities.
Abstract: This study aimed to describe the writing activities that test takers engage in when responding to the writing tasks in the TOEFL iBT® test and to examine the effects of task type and test-taker English language proficiency (ELP) and keyboarding skills on the frequency and distribution of these activities. Each of 22 test takers with different levels of ELP (low vs. high) and keyboarding skills (low vs. high) responded to 2 TOEFL iBT writing tasks (independent and integrated) on the computer. Each participant then provided stimulated recalls about the writing activities they used when performing each writing task. Stimulated recalls were coded and the results were compared across tasks and test-taker groups. The findings indicated that the participants engaged in various construct-relevant activities, such as interacting with the writing task and resources, planning, generating, evaluating, and revising. Additionally, test takers' writing activities varied significantly across tasks and to a lesser extent across test-taker groups. Participants' writing activities varied most across writing tasks and, to a lesser extent, across English proficiency groups. Low keyboarding skills seem to have affected mainly activities on the independent writing task. To better understand the role of keyboarding skills in performance on the TOEFL iBT writing tasks and to address the test's extrapolation inference, future studies need to compare the writing performance of test takers with different levels of second language (L2) proficiency and keyboarding skills in test and nontest settings.

Journal Article
TL;DR: Building and sharing knowledge is a foundational literacy activity that enables students to learn and communicate what they read in texts as discussed by the authors, which is a strategic process that involves the integration of five key components or phases.
Abstract: In this paper we provide the rationale and foundation for the building and sharing knowledge key practice for the CBAL™ English language arts competency model. Building and sharing knowledge is a foundational literacy activity that enables students to learn and communicate what they read in texts. It is a strategic process that involves the integration of five key components or phases. Before reading, students activate their relevant background knowledge to help set learning goals, identify relevant information, and ask guiding questions that set the context for learning. During reading, students understand the text by using a host of strategies to construct a coherent mental model of the text content that is consistent with their background knowledge. Students clarify meanings of unknown words and concepts as they engage in metacognitive and self-regulated learning. After reading, students consolidate what they have read by using a variety of reading strategies that strengthen the representation in long-term memory. Finally, students convey what they have read in writing, speaking, or other representational formats to reflect communication goals and the intended audience. Collectively, the building and sharing knowledge key practice is intended to both model skilled performance and help identify component skill weakness. In this paper we outline the major features of the key practice as well as address potential advantages and challenges of the approach.

Journal Article
TL;DR: In this paper, the evaluation of human-scoring quality for an assessment of public speaking skills is summarized by using the Public Speaking Competence Rubric (PQR) to score speeches.
Abstract: The purpose of this paper is to summarize the evaluation of human-scoring quality for an assessment of public speaking skills. Videotaped performances given by 17 speakers on 4 tasks were scored by expert and nonexpert raters who had extensive experience scoring performance-based and constructed-response assessments. The Public Speaking Competence Rubric was used to score the speeches. Across all of the dimensions of presentation competence, interrater reliability between expert and nonexpert raters ranged between .23 and .71. The dimensions of public speaking competence associated with the lowest interrater reliability were effectual persuasion and word choice (.41 and .23, respectively). Even expert raters, individuals with a background in teaching and evaluating oral communication, had difficulty agreeing with one another on those dimensions. Low-inference dimensions such as visual aids and vocal expression were associated with much higher levels of interrater reliability, .65 and .75, respectively. The holistic score was associated with an interrater reliability of .63. These results point to the need for a significant investment in task, rubric, and training development for the public speaking competence assessment before it can be used for large-scale assessment purposes.

Journal ArticleDOI
TL;DR: It is found that the choice of Bayesian (prior) and non-Bayesian (no prior) estimators was of more practical significance than thechoice of number-correct versus item-pattern scoring for the extreme proficiency level examinees.
Abstract: The purpose of this inquiry was to investigate the effectiveness of item response theory (IRT) proficiency estimators in terms of estimation bias and error under multistage testing (MST). We chose a 2-stage MST design in which 1 adaptation to the examinees' ability levels takes place. It includes 4 modules (1 at Stage 1, 3 at Stage 2) and 3 paths (low, middle, and high). When creating 2-stage MST panels (i.e., forms), we manipulated 2 assembly conditions in each module, such as difficulty level and module length, to see if any interaction existed between IRT estimation methods and MST panel designs. For each panel, we compared the accuracy of examinees' proficiency levels derived from 7 IRT proficiency estimators. We found that the choice of Bayesian (prior) and non-Bayesian (no prior) estimators was of more practical significance than the choice of number-correct versus item-pattern scoring. For the extreme proficiency levels, the decrease in standard error compensated for the increase in bias in the Bayesian estimates, resulting in smaller total error. Possible score changes caused by the use of different proficiency estimators would be nonnegligible, particularly for the extreme proficiency level examinees. The impact of misrouting at Stage 1 was minimal under the MST design used in this study.

Journal ArticleDOI
TL;DR: In this paper, the evaluation of human-scoring quality for an assessment of public speaking skills is summarized by using the Public Speaking Competence Rubric (PQR) to score speeches.
Abstract: The purpose of this paper is to summarize the evaluation of human-scoring quality for an assessment of public speaking skills. Videotaped performances given by 17 speakers on 4 tasks were scored by expert and nonexpert raters who had extensive experience scoring performance-based and constructed-response assessments. The Public Speaking Competence Rubric was used to score the speeches. Across all of the dimensions of presentation competence, interrater reliability between expert and nonexpert raters ranged between .23 and .71. The dimensions of public speaking competence associated with the lowest interrater reliability were effectual persuasion and word choice (.41 and .23, respectively). Even expert raters, individuals with a background in teaching and evaluating oral communication, had difficulty agreeing with one another on those dimensions. Low-inference dimensions such as visual aids and vocal expression were associated with much higher levels of interrater reliability, .65 and .75, respectively. The holistic score was associated with an interrater reliability of .63. These results point to the need for a significant investment in task, rubric, and training development for the public speaking competence assessment before it can be used for large-scale assessment purposes.

Journal ArticleDOI
TL;DR: TextEvaluator as mentioned in this paper is a text analysis tool designed to help teachers, curriculum specialists, textbook publishers, and test developers select texts that are consistent with the text complexity guidelines specified in the Common Core State Standards.
Abstract: The TextEvaluator® text analysis tool is a fully automated text complexity evaluation tool designed to help teachers, curriculum specialists, textbook publishers, and test developers select texts that are consistent with the text complexity guidelines specified in the Common Core State Standards. This paper documents the procedure used to align the TextEvaluator reporting scale with the Common Core text complexity scale and provides score ranges for use when placing texts into grade bands. Three evaluations of the proposed score ranges are reported: one implemented with respect to the set of 168 exemplar texts provided in Appendix B of the Common Core State Standards, one implemented with respect to a set of 10 career texts, and one implemented with respect to a set of 59 texts selected from textbooks assigned in first-year, credit-bearing college courses. Results suggest that the proposed ranges can help users determine an appropriate grade band placement for any text that has been evaluated by TextEvaluator, including informational, literary, and mixed texts.

Journal Article
TL;DR: Results suggest that the proposed ranges can help users determine an appropriate grade band placement for any text that has been evaluated by TextEvaluator, including informational, literary, and mixed texts.

Journal ArticleDOI
TL;DR: This research report presents a summary of research and development efforts devoted to creating scoring models for automatically scoring spoken item responses of a pilot administration of the Test of English-for-Teaching (TEFT) within the ELTeach™ framework.
Abstract: This research report presents a summary of research and development efforts devoted to creating scoring models for automatically scoring spoken item responses of a pilot administration of the Test of English-for-Teaching (TEFT™) within the ELTeach™ framework. The test consists of items for all four language modalities: reading, listening, writing, and speaking. This report only addresses the speaking items, which elicit responses ranging from highly predictable to semipredictable speech from nonnative English teachers or teacher candidates. We describe the components of the system for automated scoring, comprising an automatic speech recognition (ASR) system, a set of filtering models to flag nonscorable responses, linguistic measures relating to the various construct subdimensions, and multiple linear regression scoring models for each item type. Our system is set up to simulate a hybrid system whereby responses flagged as potentially nonscorable by any component of the filtering model are routed to a human rater, and all other responses are scored automatically by our system.

Journal ArticleDOI
TL;DR: In this article, a single location index for ordinal polytomous items is proposed and studied, based on the item category response functions (ICRFs) and item response function (IRF).
Abstract: Polytomous items are typically described by multiple category-related parameters; situations, however, arise in which a single index is needed to describe an item's location along a latent trait continuum. Situations in which a single index would be needed include item selection in computerized adaptive testing or test assembly. Therefore single location indices for ordinal polytomous items are proposed and studied. The proposed location indices (LIs) for polytomous items are mathematically derived based on the item category response functions (ICRFs) and item response function (IRF) for polytomous items. The ICRF approach resulted in three indices: LImean, LItrimmed mean, and LImedian, and the IRF approach resulted in one proposed index, LIIRF. An empirical example of real items is presented to help comprehension of the new location indices. Possible testing applications where the proposed item location indices are useful are discussed.