Wednesday, September 28, 2011

Effects of examiner familiarity on black, Caucasian, and Hispanic children: a meta-analysis.

Effects of examiner familiarity on black, Caucasian, and Hispanic children: a meta-analysis. This article presents a quantitative synthesis of examinerfamiliarity effects on Caucasian and minority students' testperformance. Fourteen controlled studies were coded in terms ofmethodological quality (high vs. low) and race-ethnicity (Caucasian vs.Black and Hispanic). An analogue to analysis of variance conducted onweighted unbiased effect sizes indicated that examiner familiarityproduced a significant effect, with Caucasian and minorityexaminees' test performance raised by.05 and. 72 standarddeviations, respectively. Examiner familiarity's differentialeffect on Caucasian and minority examinees did not interact with themethodological quality of the studies. Nevertheless, limitations of theextant ex��tant?adj.1. Still in existence; not destroyed, lost, or extinct: extant manuscripts.2. Archaic Standing out; projecting. data base require caution in drawing implications for assessmentpractice. Two decades ago, Dunn (1968) observed that minority groups wereover-identified as handicapped. He believed this overrepresentation wascaused by discriminatory intelligence and achievement tests. Dunn andothers (e.g., Cole & Bruner, 1972) have contended that these testsare biased primarily because they are ethnocentric eth��no��cen��trism?n.1. Belief in the superiority of one's own ethnic group.2. Overriding concern with race.eth : Test content isdrawn exclusively from White middle class experience. Evidence forminority disproportionality Dis`pro`por`tion`al´i`tyn. 1. The state of being disproportional. in special education and claims of biasedtesting were influential in heralded court cases in the 1970s (e.g.,Larry P. v. Wilson Riles, 1971), which curtailed intelligence testing inmany school districts (see Bersoff, 1981). Nevertheless, many measurement specialists, school psychologists,and others are increasingly skeptical that many well-known and widelyused intelligence and achievement tests are biased against minorities.Reschly (I 98 1), for example, has pointed out that subjective judgment,rather than data, often has been the basis for charges that these testsare ethnocentric. Recent empirical investigations of such tests'content, construct, and criterion validity The introduction to this article provides insufficient context for those unfamiliar with the subject matter.Please help [ improve the introduction] to meet Wikipedia's layout standards. You can discuss the issue on the talk page. have failed to show biase.g., Oakland, 1983). In this intense, sustained, and well-publicized debate, both sidestend to focus narrowly on the test instrument and virtually ignore thecontext in which assessment occurs. Research has infrequently addressedcontextual factors such as (a) examinees' interpretation of thepurpose of testing and comprehension of test instructions and (b)examiners' personality, pretest pre��test?n.1. a. A preliminary test administered to determine a student's baseline knowledge or preparedness for an educational experience or course of study.b. A test taken for practice.2. information on examinees, attitudesabout the legitimacy of testing, and choice of test location. Thispaucity pau��ci��ty?n.1. Smallness of number; fewness.2. Scarcity; dearth: a paucity of natural resources. of research on context is not surprising, given that we tend toconceptualize con��cep��tu��al��ize?v. con��cep��tu��al��ized, con��cep��tu��al��iz��ing, con��cep��tu��al��iz��esv.tr.To form a concept or concepts of, and especially to interpret in a conceptual way: the test situation as decontextualized; that is, a settingin which extra-test factors can be controlled and their effects onperformance neutralized neu��tral��ize?tr.v. neu��tral��ized, neu��tral��iz��ing, neu��tral��iz��es1. To make neutral.2. To counterbalance or counteract the effect of; render ineffective.3. (see Sigel, 1974). Surprising or not, our lackof interest in test context prevents us from knowing whether typicalsituational factors in testing affect minority and nonminority childrendifferently. One exception to the foregoing is the specific question of whetherBlack children achieve higher scores when tested by Black, rather thanby White, examiners, an issue receiving moderate attention byresearchers (see Sattler & Gwynne, 1982). Another contextualvariable explored with relative frequency, although hot with respect tominority assessment, is examiner familiarity. D. Fuchs and associates have demonstrated that language handicappedchildren obtain higher scores when tested by familiar, rather than byunfamiliar, examiners and that this performance pattern appears robust(see Fuchs, D., Featherstone, Garwick, & Fuchs, 1984- Fuchs, D.,Fuchs, Dailey, & Power, 1985; Fuchs, D., Fuchs, Garwick, &Featherstone, 1983; Fuchs, L. S., & Fuchs, 1984). Moreover, itappears that unfamiliar examiners depress de��pressv.1. To lower in spirits; deject.2. To cause to drop or sink; lower.3. To press down.4. To lessen the activity or force of something. the performance of languagehandicapped, but not nonhandicapped, children (Fuchs, D., Fuchs, Power,& Dailey, 1985), indicating that examiner unfamiliarity is a sourceof systematic error or bias in the assessment of language handicappedchildren. The importance of this finding is underscored by the fact thatmost examiners are strangers to their examinees (Fuchs, D., 1981). Since examiner unfamiliarity is part of the test procedure, ratherthan the test instrument per se, we choose to refer to this systematicerror as "test procedure bias." This is similar to"situational bias," which Jensen (1981) defined as"conditions in the test situation, such as the race, language, ormanner of the tester, that could differentially affect the testperformance of persons of different races or cultural backgrounds"(p. 137). Jensen properly distinguishes this type of bias from (a)external indicators of bias, whereby test scores are related to othervariables external to the test or test situation (such as in atest's predictive validity In psychometrics, predictive validity is the extent to which a scale predicts scores on some criterion measure.For example, the validity of a cognitive test for job performance is the correlation between test scores and, for example, supervisor performance ratings. ); and (b) internal indicators of bias,or psychometric psy��cho��met��rics?n. (used with a sing. verb)The branch of psychology that deals with the design, administration, and interpretation of quantitative tests for the measurement of psychological variables such as intelligence, aptitude, and properties of the test (such as a test'sreliability and rank order of item difficulty). Given that unfamiliar examiners appear to negatively bias the testprocedure with certain handicapped children, one may ask whetherexaminer unfamiliarity constiutes a similar bias against minoritypupils. If so, then the ubiquitous procedure of employing unfamiliarexaminers contributes to a spuriously spu��ri��ous?adj.1. Lacking authenticity or validity in essence or origin; not genuine; false.2. Of illegitimate birth.3. Botany Similar in appearance but unlike in structure or function. low performance of minoritychildren and increases the likelihood that they will be identifiedinaccurately as handicapped. Such a possibility should be of concern toexaminers, those who set professional standards for testing, and parentsand teachers of minority students. Thus, a quantitative synthesis wasconducted of the examiner unfamiliarity literature to determine theimportance of this contextual factor to minority (i.e., Black andHispanic) and Caucasian students. METHOD Search Procedure The search forpertinent studies included: I .A computer search of three on-line databases, ERIC (from 1966), Psych psychalso psyche ? Informalv. psyched, psych��ing, psychesv.tr.1. a. To put into the right psychological frame of mind: Info (from 1967), and Dissertation Abstracts (from 1927). 2. A manual search of AmericanJournal of Mental Deficiency; Child Development; Developmental Psychology; Exceptional Children; Journal of Abnormal and Social Psychology; Journal of Consulting and Clinical Psychology; Journal of Experimental Child Psychology; Journal of Ge - netic Psychology;Journal of Speech and Hearing Disorders, Language, Speech, and Hearing in the Schools; Merrill Palmer Quarterly; and Psvychol - ogy, in theSchools (1965-1982, inclusive). 3. Identification of references withinselected psychological and educational assessment textbooks as well as in all identified investigations. A study was considered for inclusion if it compared examinerfamiliarity to unfamiliarity in terms of effects on examinees'performance during individualized in��di��vid��u��al��ize?tr.v. in��di��vid��u��al��ized, in��di��vid��u��al��iz��ing, in��di��vid��u��al��iz��es1. To give individuality to.2. To consider or treat individually; particularize.3. testing. Familiarity was definedbroadly, including either children's long-term or experimentallyinduced acquaintance with an examiner. Long-term acquaintance denotes arelatively intimate relationship An intimate relationship is a particularly close interpersonal relationship. It is a relationship in which the participants know or trust one another very well or are confidants of one another, or a relationship in which there is physical or emotional intimacy. enduring over weeks or months (e.g., ateacher-pupil alliance), if not years (e.g., a mother-childrelationship). Experimentally induced acquaintance typically refers toan examiner's comparatively brief interaction with an examineeprior to testing. Examiner unfamiliarity signifies a condition in whichexaminer and examinee are virtual strangers, but one in which theexaminer has exercised typical procedures for establishing rapport. The search yielded 22 studies, of which 14 provided unambiguousdata on Caucasian and/or minority examinees' performance infamiliar and unfamiliar examiner conditions. (See Fuchs, D., &Fuchs, 1986a, for references.) Of these studies, 6 involved onlyCaucasian children, 6 included only minority (Black and/or Hispanic)children, and 2 employed both Caucasian and minority subjects. Thus, anequal number of studies (N = 8) provided data on minority and Caucasianpupils' performance in the two examiner conditions. Of the investigations, I I were published and 3 were unpublished. Atotal of 989 subjects participated; 426 were Black or Hispanic and 563were Caucasian. The sex of 442 subjects (45%) was not reported. Amongthe remaining 547 participants, 235 (43%) were female and 312 (57%) weremale. Across 12 of 14 studies providing pertinent information, 162examiners were used. Tests administered in the investigations wereidentified as intelligence (7 studies), speech/language (5 studies), oreducational achievement (2 studies) measures. (See Table I of Fuchs, D.,& Fuchs, 1986a, which describes each study's test participants,major substantive variables, methodological quality, and unbiased effectsizes.) Data Obtained from Studies Results were transformed to a commonmetric, effect size. Effect sizes were derived by determining the meandifference between examinees' scores in familiar and unfamiliarexaminer conditions and dividing this difference by the standarddeviation of examinees' scores in the unfamiliar condition (Glass,McGaw, & Smith, 1981). Some of the studies reported more than oneeffect. In all but two instances, a median effect size of examinerfamiliarity/ unfamiliarity was calculated for each study. Exceptionswere the two investigations incorporating separate groups of Caucasianand minority examinees in the same experiment. In each of these studiestwo effect sizes were reported, one for minority and one for Caucasianexaminees. Thus, 16 effect sizes (8 for Caucasian and 8 for minoritychildren) were derived. Each effect size was converted to an unbiasedeffect size (UES UES UNE (University of New England)Economics SocietyUES Upper East Side (Manhattan, NY)UES Upper Esophageal SphincterUES Unified Energy Systems of RussiaUES Waukesha, Wisconsin ), correcting for the inconsistency in estimating truefrom observed effect sizes (see Hedges, 1981). In combining UESS,weighted averages were calculated to account for the variance associatedwith this metric (see Hedges, 1984). Methodological Quality of StudiesEffects of examiner familiarity/unfamiliarity were related to acomposite procedural variable, indicating the overall methodologicalquality of each investigation. Derivation derivation,in grammar: see inflection. of this variable was based onan analysis of nine design-related features: (a) assignment of examineesto examiners; (b) assignment of examinees to treatments; (c) examinerexpectancy; (d) fidelity of treatment; (e) multiple treatment effects;(f) number of examiners; (g) order of testing; (h) scoring; and (i)technical adequacy of dependent measure. (See Fuchs, D., & Fuchs,1986a, for standards associated with each methodological feature.) One of the authors and a colleague, blind to the purposes of thisstudy, independently scored six (43%) randomly selected investigations.Average agreement across all methodological features was .89, rangingfrom .67 to 1.00. Interrater agreement was calculated using thefollowing formula (Coulter cited in Thompson, White, & Morgan,1982): Percentage of agreement = agreements between raters A and B/,(agreements and disagreements between raters A and B + omissions byrater rat��er?n.1. One that rates, especially one that establishes a rating.2. One having an indicated rank or rating. Often used in combination: a third-rater; a first-rater.A + omissions by rater B). Disagreements were resolved throughdiscussion between the raters. Since one study provided insufficient information to determinemethodological quality, we evaluated the quality of 13 studies using a4-step procedure. First, the investigations were coded"acceptable" (1) or unacceptable" (0) on each designdimension. Second, as a means of indicating relative importance, aweight of 1 or 2 was assigned each design feature. A composite scorethen was generated for each study by multiplying the coded values i or0) by the assigned weights i or 2), summing these products, anddividing the sum by the number of applicable study characteristics.Finally, we developed a frequency distribution of these compositescores, which facilitated identification of 7 high- and 6 low-qualitystudies. (These steps to determine study quality are described ingreater detail in Fuchs D., & Fuchs, 1986a.) RESULTS A test for thehomogeneity HomogeneityThe degree to which items are similar. of effect size (Hedges, 1982), undertaken to determinewhether the population effect size was constant across Caucasian andminority UESS, yielded a significant value, X2 (15, N = 16) = 89.22, p< .01. Therefore, additional analyses were conducted to explainvariations in UESs by examinees' Caucasian/minority status. Tocompare magnitude of UESs of Caucasian and minority examinees,Hedges's (1984) chi-square analogue to analysis of variance wasemployed. The mean quality rating for the 8 studies involving Caucasianexaminees was .99 (SD = .40); the average quality rating for 7 studiesassociated with minority examinees was .91 (SD = .40). This differencewas not statistically significant, t 13i .39, ns. For Caucasian examinees, the average weighted UES was .05 (v =.073), z = .72, ns. The average weighted UES for Black and Hispanicexaminees was .72 (v = .096), z = 7.47, p < 00 1. A chi-squareanalogue to analysis of variance indicated that this difference wasstatistically significant, X2 (1, N = 16) - 30.35, p < 001. Theminority group's UES indicates that, given a normative test (suchas an intelligence measure) with a population mean of 100 and a standarddeviation of 15, the use of a familiar examiner would raise the typicalminority student's score from 100 to 111. In contrast, theCaucasian group's UES suggests virtually no change in score as afunction of examiners' familiarity/unfamiliarity. In terms ofCohen's (1977) well-known U, (or percentage of nonoverlap)statistic, the upper 50% of the minority students' distribution ofscores in the familiar examiner condition exceeded 76% of thedistribution of scores in the unfamiliar examiner condition. DISCUSSIONWhereas Caucasian students performed similarly in familiar andunfamiliar examiner conditions, Black and Hispanic children scoredsignificantly and dramatically higher with familiar examiners. ThisCaucasian versus minority dissimilarity represents a difference betweendifferences (see Kaufman, DudleyMarling, & Serlin, 1986); that is,minority examinees' differential performance across the twoexaminer conditions was greater than Caucasian examinees'differential performance. Thus, the disparity between the two groupsdoes not represent a simple mean difference. Rather, it is conceptuallysimilar to an interaction, whereby the independent variable, examinerfamiliarity, has a different effect on various aspects or expressions ofanother independent variable, race-ethnicity. This difference between the two groups is described succinctly suc��cinct?adj. suc��cinct��er, suc��cinct��est1. Characterized by clear, precise expression in few words; concise and terse: a succinct reply; a succinct style.2. bythe aforementioned average UESs of minority and Caucasian subjects. Italso emerges as patterns in the data. Five of 8 investigations involvingminority subjects were associated with UESs ranging from .58 to 1.44.Cohen's (1977) well-known rule of thumb indicates these UESs rangein magnitude from "moderately strong" to "strong. "Contrastingly, 5 of 8 studies with Caucasian examinees were associatedwith UESs ranging from . 01 to .23. Cohen's guidelines suggest suchUESs are "weak" to nonexistent non��ex��is��tence?n.1. The condition of not existing.2. Something that does not exist.non . (See Table I in Fuchs, D.,& Fuchs, 1986a.) Thus, the reported average UESs of minority (.72)and Caucasian (.05) subjects were not produced by the results of one ortwo discrepant dis��crep��ant?adj.Marked by discrepancy; disagreeing.[Middle English discrepaunt, from Latin discrep investigations; rather, these divergent UESs reflectequally dissimilar patterns of UESs among studies involving minority andCaucasian examinees. Internal Validity Internal validity is a form of experimental validity [1]. An experiment is said to possess internal validity if it properly demonstrates a causal relation between two variables [2] [3]. Does examiner familiarity, then,selectively bias the performance of Black and Hispanic examinees, andrepresent test procedure bias in the assessment of minority children? Inshort, are findings from our synthesis true? Truth, here, has twoimportant meanings. The first, often described as internal validity,refers to whether race-ethnicity was principally and causally related tominority examinees' poorer performance with unfamiliar examiners.At this point, we are unsure. Our uncertainty is dictated by two aspectsof the extant data base. First, families of minority subjects were described consistently interms of low SES, whereas Caucasian subjects came from low and middleSES backgrounds. Since all Black and Hispanic examinees were of low SES,it is difficult to determine which of the two characteristics,race-ethnicity or SES, may be more important in explaining examinerfamiliarity effects. Second, in 12 investigations, Caucasian or minoritychildren were subjects, leaving only two studies that compareddifferential performance of Caucasian to minority examinees within thesame experimental design. In one of these two salient studies, Caucasianchildren demonstrated greater performance with familiar examiners. Thisfinding, or course, contradicts the overall pattern of results and, webelieve, introduces another important note of caution in anyinterpretation of the meta-analysis. External Validity External validity is a form of experimental validity.[1] An experiment is said to possess external validity if the experiment’s results hold across different experimental settings, procedures and participants. A second sort oftruth, frequently discussed as external validity, addresses the issue ofgeneralizability. Applied to our meta-analysis, this concern boils downto whether Caucasian and minority subjects' performances arerepresentative of Caucasian and minority children. This question, too,is currently unanswerable and, again, it is due to the nature of thedata. An obvious limitation of the data base in this regard is that itcomprises a small group of 14 studies. Moreover, most examinees werepreschoolers and elementary school elementary school:see school. children; many examiners were notprofessionally trained; and only three investigations involved Hispanicsubjects (see Table I in Fuchs, D., & Fuchs, 1986a). Thus, the dataprovide minimal evidence on the importance of examiner familiarity toolder and Hispanic children and to trained experienced examiners. Despite these serious constraints and necessary caveats, thepattern of findings is clear and intriguing. At the very least, the datacompel us to ask, "What if examiner unfamiliarity biases theassessment process against minority examinees?" If such were thecase, there would be very important implications for practice. Forexample, test developers' use of unfamiliar examiners to generatenormative data and indices of validity (see Fuchs, D., Fuchs, Benowitz,& Barringer, 1987) would be problematic for minority pupils.Comparing minority students' suboptimal SuboptimalA solution is called suboptimal if a part of the solution has been optimized without regards to the overall objective. performance with unfamiliarexaminers to the more maximal max��i��maladj.1. Of, relating to, or consisting of a maximum.2. Being the greatest or highest possible. performance of largely Caucasian normativepopulations could result in spuriously low and improperly restrictiveeducational placements of minority children. In such an event, examinerunfamiliarity would be a partial explanation for the frequently notedoverrepresentation of minorities in special education classrooms. Italso would represent a condition under which disproportionality ofplacement constitutes inequity of treatment, as defined by the NationalResearch Council's Panel on Selection and Placement of Students inPrograms for the Mentally Retarded Noun 1. mentally retarded - people collectively who are mentally retarded; "he started a school for the retarded"developmentally challenged, retarded (see Messick, 1984). If use of unfamiliar examiners selectively biases assessmentagainst minority examinees, an apparent remedy would be to requireexaminers to become familiar with such children prior to testing. Lessclear is how much pretest contact is necessary. Following a review ofpertinent literature, we have estimated a minimum of I hour ofexaminer-examinee interaction is required to obtain reliable familiarityeffects (see Fuchs, D., & Fuchs, 1986b). We believe this estimatemay contrast sharply with conventional practice. In a study of usermanuals of 20 widely used preschool IQ and speech/language tests, wefound that 13 manuals encourage examiner friendliness, typically definedas demonstrating warmth, maximizing comfort, and reducing anxiety orsuspicion. However, only two manuals prescribe pretest contact,described as establishing rapport gradually by meeting with the examineeon one or more occasions prior to testing (Fuchs, D., 1987). If, as wesuspect, examiners take their cues from user manuals and if the manualsin our study are representative, one could expect few examiners toengage regularly in prior contact with examinees. These implications are presented to underscore The underscore character (_) is often used to make file, field and variable names more readable when blank spaces are not allowed. For example, NOVEL_1A.DOC, FIRST_NAME and Start_Routine. (character) underscore - _, ASCII 95. the importance ofdetermining whether examiner unfamiliarity as well as other typicalfactors in the test situation negatively bias assessment againstminority examinees. If future research corroborates results of ourmeta-analysis, an important related task will be to explore whyexaminer unfamiliarity affects minority and Caucasian childrendifferently. It has been suggested that many minority examinees (a) arerelatively unmotivated to perform well (e.g., Katz, 1968); (b) are morelikely to experience test anxiety (e. g., Hawkes & Furst, 197 1)because of fear of failure, low self-concept, and unfamiliarity withtest procedures (Samuda, 1975); and (c) are hostile toward Whiteexaminers and, as a result, concerned about controlling this feelingrather than concentrating on task demands (e.g., Shade, 1982). Thispurported hostility may be connected to reports that minority examineestend to misconstrue mis��con��strue?tr.v. mis��con��strued, mis��con��stru��ing, mis��con��struesTo mistake the meaning of; misinterpret.misconstrueVerb[-struing, -strued (a) information and rhetorical questions as demandsfor accountability (e.g., Goody, 1978) and (b) direct question-answersequences as punishment (e.g., Philips, 1983) or feedback that anincorrect choice has been made e.g,, Goodnow, 1984). Studies supporting such suggestions, however, have been sporadic;their validity and importance in explaining minority examinees'possibly poorer performance with unfamiliar examiners remain open toquestion. Moreover, the bulk of this research assumes that examineecharacteristics, like presumed test anxiety, are responsible forobserved performance. Investigators infrequently have explored possibleeffects of examiner characteristics, or quality of interaction betweentest participants, as explanations for minority children'sperformance. Assuming the test situation to be bidirectional The ability to move, transfer or transmit in both directions. , we (Fuchs, L. S.,& Fuchs, 1984) explored whether pretest contact influencesexaminees' behavior and/or examiners' inaccuracy of scoring.Language impaired children were tested by familiar and unfamiliarexaminers on a comprehensive language measure. Then certified speechclinicians, who did not know the study's purpose, examinees, orexaminers, scored all performances from videotaped recordings of thetest sessions. Results demonstrated that examinees performedconsistently and significantly higher with familiar examiners,regardless of whether the scorer was the actual examiner or independentrater. At the same time, however, familiar examiners evidenced greaterinaccuracy (overestimation o��ver��es��ti��mate?tr.v. o��ver��es��ti��mat��ed, o��ver��es��ti��mat��ing, o��ver��es��ti��mates1. To estimate too highly.2. To esteem too greatly. ) in scoring than did unfamiliar examiners,suggesting that, for certain handicapped examinees, familiarityinfluences both examinee and examiner behavior. Although perhaps trite, it nevertheless is true that more researchis sorely sore��ly?adv.1. Painfully; grievously.2. Extremely; greatly: Their skills were sorely needed. needed to determine whether, and if so how, unfamiliarexaminers selectively depress the performance of minority children.Until such determination is made, we believe it is precipitous, if notincorrect, to claim testing is unbiased toward minority children.

No comments:

Post a Comment