scispace - formally typeset
Search or ask a question

Showing papers in "Personnel Psychology in 1982"


Journal ArticleDOI
TL;DR: In this article, the reliability and validity of the interview, methodological issues, decision making, interviewer training, minority characteristics, nonverbal behavior, interviewee characteristics, and interviewee training are reviewed and summarized.
Abstract: After a quick recapitulation of previous reviews of the employment interview, recent research from about 1975 is reviewed and summarized. Research dealing with the reliability and validity of the interview, methodological issues, decision making, interviewer training, minority characteristics, nonverbal behavior, interviewee characteristics, and interviewee training is summarized. Trends and directions are noted, suggestions for further research extended, and a discussion of why persistence in the use of interview exists is presented.

591 citations


Journal ArticleDOI
TL;DR: In this article, the validity, adverse impact and fairness of eight categories of alternatives were reviewed and the feasibility of operational use of each type of alternative in an employment setting was also discussed.
Abstract: Despite extensive evidence that tests are valid for employee selection, Federal Guidelines have urged employers to seek alternative selection procedures that are equally valid but have less adverse impact on minorities. Research on the validity, adverse impact and fairness of eight categories of alternatives was reviewed. Feasibility of operational use of each type of alternative in an employment setting was also discussed. Only biodata and peer evaluation were supported as having validities substantially equal to those for standardized tests. Previous reviews and more recent research indicated that interviews, self-assessments, reference checks, academic achievement, expert judgment and projective techniques had levels of validity generally below those reported for tests. Data, where available, offered no clear indication that any of the alternatives met the criterion of having equal validity with less adverse impact. Results are discussed and several additional promising alternatives are described.

450 citations


Journal ArticleDOI
TL;DR: In this article, Wherry developed a theory of rating based on several years of research and a careful analysis of the rating process, where an accurate rating is seen as being a function of three major components: performance of the ratee, observation of that performance by the rater, and the recall of those observations by rater.
Abstract: Based on several years of research and a careful analysis of the rating process Wherry developed a theory of rating. An accurate rating is seen as being a function of three major components: Performance of the ratee, observation of that performance by the rater, and the recall of those observations by the rater. Cast in a mold of classical psychometric theory each of these components is seen as consisting of a systematic portion and a random portion. The systematic portion of each component is further broken down. The performance of the ratee is a combination of true ability or aptitude for the job and the influence of the environment. What the rater observes is a function the performance of the ratee and bias of observation and what the rater recalls is a result of those observations combined with a bias of recall. The development of the theory of rating unfolds by defining the various factors that affect each of these components in a series of linear equations. Various theorems and corollaries are proposed which should lead to a maximization of the true ability component of the ratee and minimize environmental influence and the bias and error components. The theorems and corollaries suggest testable hypotheses for the researcher in performance evaluation.

208 citations


Journal ArticleDOI
TL;DR: In this paper, the adaptation of linear regression-based decision-theoretic equations used to estimate the dollar impact of valid selection procedures on workforce productivity to the evaluation of intervention programs designed to improve job performance is described and illustrated.
Abstract: This article describes and illustrates the adaptation of the linear-regression-based decision-theoretic equations used to estimate the dollar impact of valid selection procedures on workforce productivity to the evaluation of intervention programs designed to improve job performance. The appropriate equations are derived and explained, methods for estimating equation parameters are discussed, and the use of these equations is illustrated by means of a hypothetical example. It is concluded that in the future these methods and equations will allow psychologists to make more accurate assessments of the impact of intervention programs on workforce productivity than has heretofore been the case.

157 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied the relationship between boredom at work, personal characteristics and performance in heavy truck drivers and found that boredom was positively associated with higher mental and physical individual capacity.
Abstract: The present study was concerned with the relationships between boredom at work, personal characteristics and performance. Data on individual characteristics, work effectiveness and experienced boredom at work was collected from a sample of 93 heavy truck drivers by means of questionnaires and personnel file records. The results suggest that boredom while driving through a monotonous desert road was moderately, yet systematically, associated negatively with higher mental and physical individual capacity. Boredom was also negatively associated with effectiveness. The relationship between boredom and work effectiveness was significantly moderated by personal characteristics. It was found that boredom was more strongly related to work effectiveness at the lower levels of individual capacity. The results are discussed in terms of possible implications for personnel selection and placement decisions.

123 citations


Journal ArticleDOI
TL;DR: In this article, the authors conducted a job analysis to define effective supervisory behavior and found no significant difference in goal difficulty between those with participatively set goals and those with self-set goals.
Abstract: A government agency wished to define effective supervisory behavior. Fifty-seven government employees participated in the job analysis. The employees were randomly assigned to one of three goal setting conditions, namely, self-set, participatively set, and assigned goals. The task required each individual to brainstorm individually job behaviors that he or she had seen make the difference between effective and ineffective job behavior as a supervisor. Goals were set in terms of the number of behaviors to be listed within 20 minutes. There was no significant difference in goal difficulty between those with participatively set goals and those with self-set goals. Goal difficulty was held constant between the participative and assigned goal conditions by imposing a goal agreed upon by an employee in the participative condition upon an employee in the assigned condition. There was no significant difference among the three goal setting conditions regarding goal acceptance or actual performance. This was true regardless of employee age, education, position level, years as a supervisor, or time employed in the public sector. The correlation between goal difficulty and performance was .62, .69, and .74, respectively, in the participative, self-set, and assigned goal conditions.

93 citations


Journal ArticleDOI
TL;DR: In this paper, first-line supervisors were randomly assigned to two behavior modeling workshops to improve their skills in coaching and handling employee complaints, including symbolic coding and symbolic rehearsal processes, and the results of this field study replicated Decker's laboratory results showing the efficacy of formalized retention processes over any retention processes performed by trainees spontaneously.
Abstract: Twenty-four first-line supervisors were randomly assigned to two behavior modeling workshops. The training was designed to improve the supervisors' skills in coaching and handling employee complaints. One workshop included both formalized symbolic coding and symbolic rehearsal processes (experimental group) and one did not (control group). Trainee reaction to the training did not differ between groups; however, generalization of observational learning to a novel context was significantly better in the experimental group. The results of this field study replicated Decker's (1980) laboratory results showing the efficacy of formalized retention processes over any retention processes performed by trainees spontaneously. The implications of this line of research are discussed as well as future research needs.

84 citations


Journal ArticleDOI
TL;DR: In this article, the authors evaluated the performance of three scales: behavioral observation scales, behavioral expectation scales, and trait scales in observing people on videotape and found that behavioral criteria were more resistant to rating errors than trait scales.
Abstract: Ninety business students were randomly assigned to one of three conditions where they used behavioral observation scales (BOS), behavioral expectation scales (BES), or trait scales in observing people on videotape. Half the individuals received four hours of training to minimize rating errors. Rating errors were reduced significantly regardless of the rating scale that was used. However, behavioral criteria were more resistant to rating errors than trait scales. There was no significant difference between BOS and BES on this dimension. With regard to practicality, BOS were evaluated as significantly better than BES and trait scales. BES and trait scales did not differ significantly on this measure.

75 citations


Journal ArticleDOI
TL;DR: In this article, the authors measured subjective work load, time urgency, and other stress/motivation variables for management personnel taking a demanding problem-solving exam at the end of a two-week training course.
Abstract: Subjective work load, time urgency, and other stress/motivation variables were measured for management personnel taking a demanding problem-solving exam at the end of a two-week training course. Comparing measures of precourse ability and final exam performance, the primary findings were that the corrected performance scores had strong negative linear (not inverted-U) relations with both subjective work load and time urgency. General state anxiety and task involvement did not substantially relate to performance. The results are discussed in terms of the nature of this particular task and the predictions of various stress/performance theories. In problem solving or other tasks requiring novel responses, these data suggest that increases in psychological stresses like subjectively high work load and time urgency uniformly impair performance across the whole range of these variables.

75 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigated the effects of assigned versus participatively set goals, and the effect of varying goal difficulty level on an arithmetic task, and found that individuals with hard assigned goals had higher performance than peers with lower goals set in a participative manner.
Abstract: : Previous research comparing the effects of assigned versus participatively set goals on performance were essentially tests of the null hypothesis in that goal difficulty level was not systematically manipulated. The present laboratory study investigated the effects of assigned versus participatively set goals, and the effects of varying goal difficulty level on an arithmetic task. Eighty-six college students were assigned to either a participative goal condition or one of three assigned goal conditions. In two of the assigned goal conditions participants were assigned goals to those set in the participative condition, the difference being that individuals in one group were assigned goals at random and those in the other group were assigned goals on the basis of their premeasure scores. Participants in the third assigned goal condition were randomly assigned a goal in the top quartile of the goals set participatively. As hypothesized, individuals with hard assigned goals had higher performance than peers with lower goals set in a participative manner. Contrary to modern organizational theory, individuals with participatively set goals did not have higher performance than those with assigned goals of equal difficulty. Personality traits were not found to moderate the effects of goal setting on performance. (Author)

74 citations


Journal ArticleDOI
TL;DR: In this paper, a set of standards which delineates the components and characteristics of a job analysis necessary to withstand legal scrutiny is presented, and Implications of these standards are discussed.
Abstract: Selected Federal court are reviewed and analyzed to determine the criteria used by the courts in their assessment of job analyses in the development and validation of selection tests. A set of standards which delineates the components and characteristics of a job analysis necessary to withstand legal scrutiny is presented. Implications are discussed.

Journal ArticleDOI
TL;DR: In this paper, a review of the legal status of seniority's legal status in the management of human resources is presented, while there is little research on seniority per se, conjecture and empirical study on the concept's salient behavioral dimensions, viz., tenure and reward, are reviewed.
Abstract: Though an important and widespread industrial relations concept, seniority has been a neglected subject of study by behavioral scientists. The purpose of this paper is to emphasize the importance of the topic by reviewing seniority's legal status in the management of human resources. Further, while there is little research on seniority per se, conjecture and empirical study on the concept's salient behavioral dimensions, viz., tenure and reward, were reviewed. Suggestions were offered for the methodological and theoretical aspects of future research on seniority.

Journal ArticleDOI
TL;DR: In this paper, the authors examined the hypothesis that resume determinateness is positively related to the evaluation given to the resume by prospective employers and found that female raters showed a tendency to be more stringent than males in their evaluations, and this effect was pronounced for older, married applicants.
Abstract: Six male and six female personnel professionals were presented with resumes in which sex, age, marital status, and academic achievement were systematically varied. The study examined the hypothesis that resume determinateness is positively related to the evaluation given to the resume by prospective employers. The provision of ambiguous information was found to distort the evaluation process but not in a consistent manner. Married males and females were evaluated more positively than were those who were single. Also married females with high academic status were evaluated more positively than were married males with high academic status, although there were no differences for single males and females. Female raters showed a tendency to be more stringent than males in their evaluations, and this effect was pronounced for older, married applicants. The study examines the implications of these results and makes suggestions for future research.

Journal ArticleDOI
TL;DR: In this paper, the authors presented a study that showed that the questionnaire is eventually equivalent to the interview as a predictor of performance in military training, and that it is also very important for selecting individuals to specific tasks.
Abstract: The need for reliable and valid measures of personality and motivational factors in the prediction of success in military training was discussed. The personnel classification system currently used by the Israeli Army was briefly described. The personality factors used in that system are measured by an interview, which is individually administered to each enlisted man. The goal of the present study was to replace this interview by an objective group questionnaire, with the hope of saving time, manpower and effort without any loss to predictive validity. The criterion for validation of the system was the performance of the soldiers in elementary training. This performance was assessed by commanding officers and by peers. The results showed that the questionnaire is eventually equivalent to the interview as a predictor of performance in military training. It was concluded that the questionnaire should be preferred for economical reasons. MANPOWER administration of big institutions can benefit from psychological testing in several ways. Development of valid measures for predicting performance in different levels and tasks in the institution is crucial for an efficient classification of manpower to the different branches of the institution. It is also very important for selecting individuals to specific tasks. The present paper deals with ' We wish to express our gratitude to the Psychological Unit of the Israeli Army, for its participation in all stages of this study, for the helpful advice, and for the financial support. We also thank the various commanders of all the units which participated in the study, for their cooperation. The study was based on M.A. thesis conducted by Josef H. Tubiana under the

Journal ArticleDOI
TL;DR: In this article, the authors examined the effects of organizational differences and rater differences on performance appraisals and found that organizational differences may restrict the generality of the findings of performance appraisal studies across organizational settings, and also may have a negative impact on the usefulness of any particular performance appraisal form in different settings.
Abstract: This study examines the effects of organizational differences and rater differences on performance appraisals. Self, peer, and supervisory ratings of performance for nurses in four hospitals and self, student, peer, and supervisory ratings for resident advisors in seven university dormitory complexes were used in this study. The analyses indicate that both organization and rater differences have significant, independent effects on performance ratings. The findings suggest that organizational differences may restrict the generality of the findings of performance appraisal studies across organizational settings. They also may have a negative impact on the usefulness of any particular performance appraisal form in different settings, and on the ability of managers to accurately interpret and compare performance ratings for individuals in different organizational subunits.

Journal ArticleDOI
TL;DR: In this paper, a negative relationship between stress and perceived organizational effectiveness was found, suggesting that the type of stress moderates the stress and effectiveness relationship, and the level of dysfunctional stress provided a better explanation of variations in effectiveness levels than total stress levels.
Abstract: Several studies have found an inverted U-shaped relationship between stress and performance levels for individuals. The present study determined whether such a relationship exists between stress and the perceived effectiveness of formal organization groups. Analysis of data from four firms provided no support for the existence of such a relationship. Instead, a negative relationship between stress and perceived organizational effectiveness was found. The results suggest that the type of stress moderates the stress and effectiveness relationship. Dysfunctional stress was the dominant type of stress in all four firms. Further, the level of dysfunctional stress provided a better explanation of variations in effectiveness levels than total stress levels.

Journal ArticleDOI
TL;DR: The authors investigated the impact of the racial attitudes of interviewers, applicant race, and applicant quality on the ratings given applicants using a posttest-only control group approach which was analyzed by a 2 × 2 ×2 × 2 factorial ANOVA design.
Abstract: This study investigated the impact of the racial attitudes of interviewers, applicant race, and applicant quality on the ratings given applicants. This study used a posttest-only control group approach which was analyzed by a 2 × 2 × 2 factorial ANOVA design. Subjects were 176 white business administration students from a large urban university. Videotapes of simulated job interviews were produced to control applicant quality and applicant race. A black male and a white male each role-played both a high and a low quality applicant. The main effect for applicant quality was significant, accounting for 50% of the variance in applicant ratings. The main effect for race was significant but not in the predicted direction. Black applicants were rated higher than white applicants. While high quality applicants were rated highly regardless of race, the low quality black applicant was rated higher than the comparably performing white applicant. The interaction of race and interviewers' level of prejudice was significant but not in the predicted direction. Highly prejudiced subjects rated black applicants higher than white applicants. The implications of these results for further research were discussed.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the existence of a moderating effect for situational control of performance variance on the relationship between individual differences and performance, and found that the effect of individual differences on performance was significant.
Abstract: The present study investigated the existence of a moderating effect for situational control of performance variance on the relationship between individual differences and performance. An experimental simulation was conducted and validity coefficients were calculated. Results supported the presence of the predicted moderating effect. The implications of these data for validation research and testing programs are discussed.

Journal ArticleDOI
TL;DR: Based upon recent reviews of evaluation methodology in Organization Development (OD), a description of a viable method for measuring planned organizational change is synthesized in this article, and the application of the procedure to a university student counseling center involved in an OD project utilizing an eclectic intervention is reported.
Abstract: Based upon recent reviews of evaluation methodology in Organization Development (OD), a description of a viable method for measuring planned organizational change is synthesized The paper reports on the application of the procedure to a university student counseling center involved in an OD project utilizing an eclectic intervention A diagnostic, reliability-tested questionnaire was used in a three year modified multiple time series research design, with a closely matched comparison organization, and data were analyzed for three types of change Results support the contention that the measurement method is viable for accurately assessing the impact of an OD intervention and thus providing the groundwork for developing a rigorous, empirically-based theory of OD However, several problems and tradeoffs are made explicit

Journal ArticleDOI
TL;DR: In this article, the authors examined differences in job orientation between black and white male and female business college graduates and found significant race differences on 10 of 25 job characteristics, with blacks rating 9 of these more important than whites.
Abstract: Differences in job orientation between black and white male and female business college graduates were examined. Significant race differences were found on 10 of 25 job characteristics, with blacks rating 9 of these more important than whites. Significant race by sex interactions exist on four characteristics, while sex differences were found on nine. Factor analysis indicates that blacks value long-range career objectives and structure considerably more than do whites, while their preference for intrinsic and extrinsic factors was less pronounced. Methods by which organizations can satisfy the greater importance placed on many job characteristics by blacks are explored.

Journal ArticleDOI
TL;DR: This article conducted interviews with union business agents on conditions necessary for their support of a goal-setting program, and the results showed a significant increase in productivity for the drivers who received specific goals.
Abstract: Interviews were conducted with union business agents on conditions necessary for their support of a goal setting program. Subsequent to the interviews, goals were assigned to 39 truck drivers. The results were analyzed using a design that included a comparison group (N= 35). The results showed a significant increase in productivity for the drivers who received specific goals. When the conditions necessary for the union's support of the goal setting program were no longer met, there was a wildcat strike.

Journal ArticleDOI
TL;DR: Snyder and Swann as discussed by the authors found that individuals seek evidence to confirm initial hypotheses, or preconceptions, which they form about other people prior to interaction, and that seeking confirmatory evidence makes it likely that a hypothesis will be confirmed.
Abstract: Recent findings in the impression formation literature suggest that individuals seek evidence to confirm initial hypotheses, or preconceptions, which they form about other people prior to interaction, and that seeking confirmatory evidence makes it likely that a hypothesis will be confirmed (Snyder and Swann, 1978). Snyder and Swann suggest that the employment interview is one context in which this process can be expected to operate. Four studies are reported which examine the generalizability of these findings to the employment interview. Consistent use of confirmatory hypothesis testing strategies was not found when experienced interviewers, rather than college students, were used as subjects, nor when the study was set specifically in an employment interview setting, nor when hypotheses about characteristics other than those examined by Snyder and Swann were studied.

Journal ArticleDOI
TL;DR: In this article, the authors reviewed two court cases to determine the standards set by the courts for establishing a claim of sexual harassment under Title VII of the Civil Rights Act of 1964.
Abstract: Fifty-two court cases were reviewed to determine the standards set by the courts for establishing a claim of sexual harassment under Title VII of the Civil Rights Act of 1964. Twenty-nine are discussed. Three major issues were examined in Part I of the review: (1) the gender-based nature of sexual harassment at work, (2) the direct and indirect employment-related consequences that result from the harassment, and (3) the extent of employer liability for the sexually harassing acts of their employees. Part II discussed the general principles that were distilled from the court cases and examined future trends and preventive measures, as well as the role of professionals in future research. A plan of action to combat sexual harassment at the workplace consistent with court interpretations was presented.

Journal ArticleDOI
TL;DR: The Inter Face Project as discussed by the authors was a pilot study to improve race relationships between supervisors and employees in South African industry, where behavioral modeling training was used to change the behaviors of employees.
Abstract: The Inter Face Project was a pilot study to improve race relationships between supervisors and employees in South African industry. The ultimate intent of this project was to find a path for more extensive research aimed at improving inter-racial attitudes and behavior in South Africa, especially in work situations. Nevertheless, this project is only a small step in the direction of improved inter-race relationships in that country. Behavior modeling training was the vehicle for attempting to change the behaviors. Several measures were used because it was uncertain how change would be found. However, performance records were incomplete and therefore not useful. A questionnaire and expectation scale did not reveal significant change, possibly because of the small sample size and the reluctance on the part of supervisors to be candid. Structured interviews, however, indicated dramatic improvements in inter-race attitudes and observations for both supervisors and employees. Second post-test improvements (20 weeks after training) were, in fact, even greater than improvements noted at the first post-test (six weeks after training). Before and after training comments by employees and supervisors provided strong evidence of the extent of change that occurred. Several suggestions for follow-up research were outlined.

Journal ArticleDOI
TL;DR: In this paper, the importance of temporal trends within absenteeism data for males and females was investigated. But, the data support previous findings of higher absenteeism rates for women when compared to men, as suggested by Dansereau, Alutto and Markham.
Abstract: The purpose of this research is to test the importance of temporal trends within absenteeism data for males and females. The data support previous findings of higher absenteeism rates for women when compared to men. The data also indicate the importance of temporal trends as suggested by Dansereau, Alutto, and Markham (1978). Conclusions are drawn concerning the use of absence rates as dependent variables.

Journal ArticleDOI
TL;DR: Paired comparison evaluations were solicited for a relatively small group (N= 20) of savings and loan association branch managers as mentioned in this paper, and the peer-generated evaluations assisted the officers in making acceptable promotional decisions.
Abstract: Paired comparison evaluations were solicited for a relatively small group (N= 20) of savings and loan association branch managers. Peer evaluations were obtained from 16 of these managers; supervisory evaluations were obtained from 4 officers. Inter-judge agreement (both within and between groups) was high. Further, this agreement extended beyond the derived paired comparison score to certain independently measured psychological characteristics of the persons evaluated. The peer-generated evaluations assisted the officers in making acceptable promotional decisions. In addition, discussion by the 4 officers of differences between their independently made evaluations made explicit a previously covert but potentially important difference in perspective about the determinants of managerial effectiveness. Ratings assigned by one officer reflected his implicitly heavy weight to human relations skill as a component of branch manager effectiveness; those assigned by the other 3 tended to give more weight to knowledge about financial matters related to the savings and loan industry.

Journal ArticleDOI
TL;DR: In this article, the authors investigated whether narrative job descriptions could be converted to quantitative rating scores using traditional job analysis questionnaire techniques and found that the dimensions reflected important differences and similarities among the job categories and thus provided a viable means of grouping jobs into higher order families.
Abstract: The present study investigated whether narrative job descriptions could be converted to quantitative rating scores using traditional job analysis questionnaire techniques. Detailed written descriptions of 121 job categories in a military health care facility were rated using the Position Analysis Questionnaire (PAQ). Indices of interrater agreement suggested acceptable levels of agreement for job dimension scores derived from these ratings. Further, when regressed against GATB abilities estimates, the job dimension scores produced values very similar to those reported by previous studies using the PAQ. Finally, cluster analyses of the 121 job categories suggested that the dimensions reflected important differences and similarities among the job categories and thus provided a viable means of grouping jobs into higher order families. Potential uses for data derived from narrative job descriptions are discussed.

Journal ArticleDOI
TL;DR: In this paper, a behaviorally based performance appraisal system for highway patrol personnel is described, which is based on Blanz and Ghiselli's Mixed Standard Scale (MSS).
Abstract: This paper describes the development of a behaviorally based performance appraisal system. Blanz and Ghiselli's Mixed Standard Scale was used as the basis for developing the performance appraisal system for assessing the performance of highway patrol personnel. However, the particular developmental procedures described here differ in some respects from those reported in the literature. Rather than developing rating items describing general traits such as “diligence,”“initiative,” or “enthusiasm” in behavioral terms, the items in the present scale were developed to describe proficiency levels of specific job tasks. This characteristic is expected to enhance the objectivity of the evaluation system for both appraisal and job counseling purposes. The appraisal instrument was subjected to a series of reliability and validity tests that demonstrated its high reliability and validity. Although the content of the appraisal sytem desribed here included highway patrol tasks, a similar system could be developed using the procedures described for a wide variety and level of jobs.

Journal ArticleDOI
TL;DR: In this article, the authors describe the synthetic validity paradigm as construct validity and describe the approaches to synthetic validity employed by Lawshe, Guion, McCormick, and Primoff.
Abstract: The Uniform Guidelines for Employee Selection Procedures have served to create an urgent need for efficient validation methods that can be generalized to a class of occupations. The one method currently authorized for such a purpose by the Guidelines is synthetic validation. (The Guidelines erroneously describe the synthetic validity paradigm as construct validity.) Approaches to synthetic validity employed by Lawshe, Guion, McCormick, and Primoff are described. Their extent of conformance to the Guidelines validation requirements is noted. Primoff's J-Coefficient approach is recommended for two reasons; it meets the Guidelines requirements and under certain circumstances it permits the test user to estimate the traditional validity coefficient. An illustrated example of Primoff's method is presented.

Journal ArticleDOI
TL;DR: In this paper, the authors pointed out a serious flaw in the conceptual basis of Behavioral Observation Scales (BOS) and showed that its solution would make BOS indistinguishable from other methods already in existence.
Abstract: An earlier article by the present authors (Bernardin and Kane, 1980) pointed out a serious flaw in the conceptual basis of Behavioral Observation Scales (BOS). The present article explains this flaw in more detail and shows that its solution would make BOS indistinguishable from other methods already in existence. The focus then shifts from the conception of appraisal methods to their evaluation with the presentation of a discussion of the special problems of error capitalization that arise in the use of item analysis in the development of multi-item rating scales. This discussion proceeds to describe the correct approach to removing the effects of error capitalization in the evaluation of the psychometric properties of such rating scales.