Downing, L. L. (1994). Criterion shaped behaviour: Pitfalls
of performance appraisal.
International Journal of Selection and Assessment, 2,
1-21.
PART 3
PROBLEMS OF CRITERION DEFICIENCY
A measure will be Criterion
Deficient to the extent that it fails to assess one or more of the behaviors
or characteristics of the Ultimate Criterion, i.e., of the hypothetical,
"ideal" performance. Deficiency CSBs will include reductions in any
factors of ideal performance that are not measured. To know what
these will be requires a list of Ultimate Criterion Factors, and a list
of Actual Criterion Factors. A hypothetical list, pertaining to teacher
evaluation, will be used for purposes of illustration (see Figure 2)
| Actual Criterion Factors | Ultimate Criterion Factors |
| 1. Average score of students on national
objective test. 2. Percentage of students who eventually
|
1. Preparation of students to take objective tests.
2. Preparation of students for eventual graduation. 3. Teaching students to write effectively. |
Figure 2: Hypothetical Situation Illustrating Criterion
Deficiency
PROBLEMS OF CRITERION CONTAMINATION
A measure
will be Criterion Contaminated to the extent that scores can be influenced
by increasing factors that are not part of the Ultimate Criterion.
Contamination CSBs will include efforts to increase these factors, and
more importantly will include decreases in Criterion Relevant and Criterion
Deficient Factors resulting from such efforts. To illustrate these
Criterion Contamination effects, let us devise a second set of hypothetical
lists (see Figure 3).
| Actual Criterion Factors | Ultimate Criterion Factors |
| 1. Average score of students on national
objective test. 2. Percentage of students who eventually
3. Supervisor ratings of the writing
a. Reflecting actual quality of students'
|
1. Preparation of students to take objective
tests. 2. Preparation of students for eventual
3. Teaching students to write effectively. |
Figure 3: Hypothetical Situation Illustrating Criterion Contamination
We will assume that the Actual
Criterion Measure adequately assesses Factors 1 and 2, which overlap with
the Ultimate Criterion and are, therefore, Criterion Relevant, leading
to desirable Relevant CSBs. We will also assume that teaching students
to write effectively, Factor 3, has been assessed by the supervisor's rating,
and is no longer Criterion Deficient. Unfortunately, in assessing
Factor 3 with a subjective rating by the supervisor, we have unintentionally
added a contaminating bias factor by means of which it is possible for
teachers to increase their scores through other, possibly easier, and incompatible
Contamination CSBs. In our example, if efforts to be liked by the
supervisor prove to be an easier and more reliable means of improving one's
rating, and hence one's total score on the Criterion Measure, than do efforts
to enhance Factors 1, 2, and 3a, and if those efforts take time and energy
away from the pursuit of 1, 2, and 3a, we will expect a decrease in these
Relevant Factors, as well as decreases in any desired but unmeasured Deficiency
Factors such as were discussed previously.
The effects of Criterion
Contamination on reductions in Criterion Relevant and Criterion Deficient
Factors of the Ultimate Criterion can be devastating, but even were these
effects somehow miraculously avoided, increases in Contamination Factors
may have other detrimental effects on the organization. The variety
of undesirable consequences that can be expected as a result of Criterion
Contaminated measures requires some elaboration. In the following
section will be presented many such anticipated pitfalls of evaluation
systems, and attempts to shed some light on when each type of effect is
most likely to occur.
Undesirable Consequences From the Use of Criterion Contaminated Measures.
It is generally the case that Contamination CSBs are more of a problem when using subjective rating methods than when using objective testing methods to evaluate performance. This is a result of the fact that raters are highly susceptible to influences that systematically bias their evaluations of others, while such evaluator biases do not usually influence scoring of objective tests. Nevertheless, Contamination does also occur with the use of objective tests. Some of the major types of Contamination CSBs that can influence objective test scores are presented here.
Contamination Effects on Objective Tests: Contamination CSB's that are most likely when using objective tests fall into three categories:
1. Cheating. One category of Contamination CSBs includes the numerous versions of cheating, all of which refer to efforts to increase Actual Criterion Factors without increasing Ultimate Criterion Factors. The Contaminating Factors in this case include all possible illegitimate means of obtaining a high score. These include copying from someone else's test, obtaining a key of correct answers before the test, concealing written answers on "cheat sheets," written on parts of one's body, or hidden in the restroom, and having someone else take the test under the name of the person to be evaluated. Most students and teachers have also encountered more original methods tailored for special testing situations. Most of these methods are more common when objective tests are being employed, but some would suffice for nearly any type of evaluation system.
2. "Cutthroat Methods."
A special variety of Contamination CSBs is applicable to any evaluation
system in which one's score, or the value of consequences contingent upon
a given score, is determined by one's performance relative to that of others.
A ranking system, and "grading on a curve," are examples of such systems.
One illegitimate way to improve one's score is to actively undermine the
performance of others whose performance is being evaluated. This
can be accomplished in many ways. I have known of students in a competitive
law school to steal, mutilate, or "misplace" law books in the library that
were essential to the adequate performance of classmates on papers or exams.
I taught at one competitive undergraduate liberal arts college where students
who used such tactics were openly referred to as "throats," short for cutthroats.
Where teachers are being evaluated for raises, promotions, or
tenure, one would hope that such potentially destructive undermining of
colleagues is rare, and perhaps it is so, but many teachers seem to know
of at least one instance where even more brutal internecine CSBs occurred.
3. Loss of Mutual Cooperation. More common, perhaps, than cutthroat tactics, but possibly more damaging to the organization, is the widespread tendency for competitive evlauation systems to discourage interdependent cooperation of group members. A system in which members believe that their own self-interest is incompatible with the interests of their colleagues is not likely to be either satisfying to individuals or productive for the organization. This can be a particularly pernicious effect in schools, where it is often the case that fellow teachers are a teacher's most valuable resource. The cost to an organization of undermining the potential for cooperation between members is far from a trivial consideration, yet seldom are such anticipated problems taken into account by those advocating new methods for evaluating teachers and making them more accountable.
The use of "merit money"
in schools will almost always be subject to the above effects, even if
the evaluation system does not appear to be of the ranking variety.
This is because the amount of money to be distributed amongst teachers
is always limited, and however performance evaluations were initially done,
for purposes of allocating merit money some form of ranking must be devised
from them. Any system in which teachers perceive that their own raise
or bonus will be less if their colleagues perform well, than if they perform
poorly, is subject to the types of undesirable consequences described.
The prominence of efforts
such as those described above is a result of the fact that these Contamination
CSBs are perceived as requiring less effort, or are believed to have a
high expectancy of success at achieving valued outcomes. The negative
experiences of guilt or shame for having violated laws, rules or norms
designed to further social, institutional or organizational goals, should
be expected to offset these gains for individuals. To the extent
that they are personally subject to such emotions these will be costs,
or negative utilities. The unfortunate derivation from this set of
premises is that those most likely to "succeed" in such a system are precisely
those most lacking in such scruples. As a nation that has experienced
Watergate, Iran-Contra, the HUD scandal, the savings and loan crisis, and
other debacles too numerous to name, one might question the advisability
of imposing more systems in which "good guys" are destined to finish last.
Contamination CSBs in Subjective Rating Systems. Many of the Contamination CSBs common to the use of objective tests, such as some forms of cheating or of undermining the performance of others, are also potential problems for use of subjective rating systems. Because rating systems typically depend upon long-term exposure to and knowledge of the person being rated, however, some forms of cheating are less likely than with objective tests. The types of contamination that are most often important in the use of subjective measures primarily involve rater bias.
Rater Bias. The amount
of research that has bben conducted on the sources and varieties of rater
bias is staggering. This research is usually presented as a demonstration
of sources of invalidity in subjective rating systems of evaluation, all
of which undermine the usefulness of such methods for purposes of assessment
(cf. Smithers, 1988).
From the perspective of
Criterion Shaped Behavior Theory, this research provides a basis for understanding
the numerous means by which the individual being evaluated can, by increases
in Contamination CSBs, increase scores on the Actual Criterion Measure.
This class of CSBs all involve the active exploitation of those biases
that cause ratings to reflect factors other than those contained in the
Ultimate Criterion. Little has been formally written about these
effects, nor do other theories show the functional relationships between
the various types. Anyone who has ever worked in an organization
where performance evaluations were used as a basis for distribution of
valued outcome, however, will immediately recognize many of them.
They are often known by colorful names which reflect the unacceptable or
distasteful connotations associated with them, or with those who use them.
One such category of Contamination
CSBs contains Asskissing, Brown-nosing, Bootlicking, and Sucking-Up-To-The-Boss.
In more polite language these are called Ingratiation, or Flattery.
At the extreme they may involve granting of major Favors, giving of expensive
Gifts, or Sleeping With The Boss. Where a quid pro quo is involved,
these may become Bribery. All of these involve providing the evaluator
with something of value. This may be the illusion of superiority,
the semblance of power, the enhancement of ego, sexual compliance, or more
material commodities including gifts, or money. Making any of these
available to an evaluator is a Criterion Shaped Behavior if it results
from an expectation of more favorable ratings on a performance evaluation
measure. Where a subjective element exists in the method used,
as is nearly always the case with the use of ratings, research points to
several sources of such a bias.
Exploiting the Interpersonal Attraction Bias
Ingratiation Tactics.
Considerable research shows that raters consistently favor those who they
personally like in their performance ratings (Berscheid and Walster, 1978).
Much of this will occur unintentionally, for raters who have formed a positive
impression of an individual will be susceptible to a "Halo Effect."
People perceive and interpret new information in ways supportive of earlier
impressions. They selectively perceive, selectively store, selectively
retrieve, and selectively interpret information in a biased fashion directed
to sustaining those early impressions. People also are biased by
a belief that all positive characteristics are positively correlated with
each other (Bruner and Tagiuri, 1954), and any early attribution of positive
traits or behaviors, e.g., being likeable, increases expectations that
the liked person will also possess other positive traits, e.g., being a
good teacher. Anything one can do to increase how much he or she
is liked by the rater will potentially lead to receiving a higher score
on a subjective measure of performance.
The tactics listed
earlier, Brown-nosing, Bootlicking, and other unsavory categories of ingratiation
(Jones, 1964) can all be viewed as Contamination CSBs designed to enhance
one's scores on the Actual Criterion Measure by increasing one's interpersonal
attractiveness to the rater. The means by which this can be accomplished
are delineated in any recent textbook on social psychology (c.f., Brehm
and Kasin, 1990). A broad generalization of the many separate influences
is that we like others to the extent that they have been associated with
our own positive experiences (Byrne, 1971). If one has good feelings
because of, or merely in association with, a person, he or she will tend
to like that person. If one can enhance such good feelings in a rater,
by flattery, by gift-giving, etc., one can increase liking and consequently
how well one is rated on subjective measures of performance. To the
extent that it is easier to increase one's score by use of such Contamination
CSBs, one will be less likely to expend the effort required for real improvements
in performance, i.e., Relevant CSBs.
Exploiting the Similar-To-Me-Bias.
Perhaps the most thoroughly documented causal factor in interpersonal attraction
is perceived similarity. People like best others who they perceive
to have attitudes, beliefs, lifestyles, and group memberships similar to
their own (Byrne, 1971). Because similarity is not merely a correlate
of attraction, but a cause of it, anything that increases an evaluator's
perception that the person being rated is similar to himself, or herself,
is likely to produce an increase in that evaluator's performance ratings
(Wexley and Yukl, 1977). Efforts to increase perceived similarity
are, therefore, Contamination CSBs.
Such behaviors will not
always be undesirable, in and of themselves. In fact, encouraging
the subordinate to become more like the supervisor is the essence of role-modeling
and mentoring, both of which can be quite desirable to the organization.
The problem is that the Similar-To-Me-Bias will operate also for traits
or characteristics that are irrelevant to or are even negatively related
to the Ultimate Criterion, and that efforts to increase perceived similarity
may interfere with Criterion Relevant behaviors.
If the boss spends all weekend
watching football games and smoking cigars, for example, it is not evident
that increases in such behaviors will necessarily benefit the organization.
And if the supervisor is a member of the local Elk's lodge, it is to be
expected that subordinates whose promotions are believed to be dependent
upon being liked by that supervisor will be motivated to join also.
In this last example, it is ironic that the legal system has at times recognized
the barring of women from such organizations as unfair discrimination.
While the legal argument typically focuses on women having reduced opportunities
to engage in effective business transactions, in our terms women have been
unfairly prevented from certain opportunities to exploit the personal biases
of those engaged in subjective ratings of their performances. Were
there no Criterion Contamination, and no Contamination CSBs, such discrimination
would be less of a factor.
Exploiting the Reciprocity of Liking Bias. A norm of reciprocity is a powerful influence on a wide variety of interpersonal behaviors. Some view the norm as universal in its application (Gouldner, 1960). One consequence of the operation of this norm is that individuals reciprocate liking, just as they reciprocate favors, gifts, insults and threats. It is not simply that we like others who like us, but, at least to some extent, we like others because they like us; or more specifically because we believe they like us. Ingratiation often involves the attempt to improve one's rating by pretending to like the rater. This Contamination CSB may be relatively harmless, for it probably does not interfere much with actual performance, but as is the case with many CSBs, it can have other detrimental impact on the organization. For example, in rewarding the disingenuous subordinate, less deceitful employees may feel inequitably treated, and resulting dissatisfaction and social friction may indeed interfere with productivity.
Exploiting the Reciprocity
of Behavior Bias. The norm of reciprocity (Gouldner, 1960) requires
that anything received from another be returned to them, if not in kind,
at least in value. In theories of social exchange (Thibaut &
Kelley, 1959), the commodities of exchange include both material ones,
such as money, and social or symbolic ones, such as approval or status.
As a norm, reciprocity reflects both an expectation and a social requirement.
Violation of the norm invokes both privately felt unease and socially mediated
negative sanctions. As a result, one powerful means of increasing
the score one is given on a subjective rating, a valued outcome, is to
previously give something of value to the rater. What one gives can
be anything valued by the rater, including approval, status, money, sex,
etc. Having received something of value, the rater will feel an obligation
to return something of value to the giver. One such commodity is
an inflated, i.e., biased, score on the performance evaluation.
An apple for the teacher,
a donation to a congressman, pretended admiration of the boss, a case of
Scotch for the purchasing agent's Christmas present, inviting the principal
to dinner, and giving your unused NBA tickets to the superintendent are
all potential means of influencing another's subjective decision relevant
to one's own valued outcomes. All of these are Contamination CSBs,
for all can produce an increase in one's performance score without increasing
factors which constitute the Ultimate Criterion.
The undesirability of such Contamination CSBs is both a result
of decreases in desirable behaviors in the relevant and deficiency sectors,
and of inherently undesirable effects arising from such CSBs themselves.
While some such effects may be trivial, others involve wholesale attempts
to subvert the official system and to replace it by one based upon such
concerns as who owes who how much, rather than upon whose performance best
promotes the goals of the organization.
The reciprocity norm can
function with or without the explicit conscious awareness by the rater
of having been so influenced. In extreme cases, however, such awareness
does exist, and the rater knowingly participates in the illicit practice
of exchanging favorable ratings for valued outcomes. Though the distinction
between being a dupe and being a conspirator is often difficult to define,
bribery or quid pro quo exchanges are usually considered to be unethical
or illegal, and evaluators found guilty of such conscious participation
are subject to punishment or dismissal. In fact, many forms of Contamination
CSBs are viewed as unethical, or illegal, including the previously mentioned
varieties of cheating. Clearly, a system in which such behavior is
successful at increasing performance scores used for allocation of valued
outcomes is subject to many undesirable consequences.
Reciprocity occurs for exchange
of negative as well as positive outcomes. People dislike, hate, or
aggress against others who are believed to dislike, hate, or aggress against
them. In a system where everyone evaluates everyone else, one may
fear that to give others negative evaluations may increase the chances
of being negatively evaluated in return. An expectation of reciprocity
may lead a teacher to give inflated grades to students who, later in the
semester, will be asked to evaluate the teacher. I believe one would
find a relationship between the increased use of student evaluation of
teachers, in the 1970s, and the widely recognized trend toward higher grade
point averages of students, in that same period of time, known as grade
inflation.
In peer review, giving a
negative evaluation to a colleague's performance may rightly be expected
to result in that colleague giving the rater a lower evaluation in return,
hence contaminating the rating process with a well known positivity bias.
As with all of the other reciprocity effects, this may occur with or without
conscious awareness of the evaluator.
Additional Contamination
CSBs. Of the many potential sources of Criterion Contamination,
all of which are problematic from an assessment point of view, some more
easily lend themselves to the shaping, or manipulating, of rater's behaviors
than do others. For example, the "similar to me" bias may be exploited
by taking up the boss's hobby, or by adopting the rater's political views,
but cannot be exploited by becoming the same sex, race, age, ethnic group,
etc. As such, Contamination CSBs only occur related to biasing factors
amenable to change. So far we have looked primarily at those involving
liking and reciprocity, but other means, not involving the relationship
to the rater, but rather the limitations of the rater as an information
processor, also exist.
Exploiting the "Opportunity
Bias." From the assessment point of view it would be contaminating
to evaluate the performance of two individuals on the same basis, if in
fact they were provided with unequal opportunities to perform well.
For example, rating principals in terms of how many of the school's students
go on to college, would be clearly unacceptable if one had a school in
a poverty neighborhood, and the other had a wealthy suburban school.
Rating teachers based on performance of their students poses the same difficulty
when teachers have classes differing in student ability, in class size,
in content difficulty, etc. Where the evaluation system has not eliminated
such opportunity bias, it is sometimes possible to exploit it to one's
own advantage through Contamination CSBs. As a college professor,
I have observed a reluctance on the part of colleagues to volunteer to
teach the most demanding courses in the year prior to a tenure decision,
or to not teach the high enrollment introductory course until after one's
promotion, or to not teach courses that meet at 8:00 A.M., or on Friday
afternoon, because students will like them less well and hence rate the
professor less favorably. For the student, taking easy courses from
lenient professors is a tried and true means of improving one's grade point
average. Efforts directed at maximizing performance scores by improving
one's opportunities are a type of Contamination CSB. Where such efforts
detract from the pursuit of organizational goals, they are undesirable,
and where they reward the whining and insistent complainer, and punish
the cooperative team player, they promote the least desirable behaviors
in those being evaluated.
Exploiting Information
Processing Errors. Raters not only fail to adequately take into
account the differences in opportunities afforded to those being evaluated,
but they are subject to an array of well documented errors which systematically
bias performance evaluations. Some of these also can be exploited.
For example, ambiguity, confusion, lack of information, and uncertainty
all increase the impact of a "central tendency bias," i.e., a tendency
to give someone an "average" score. An average score is good for
an inadequate performer, hence such an individual will be motivated to
create conditions in which the bias will operate. An inadequate teacher
may not invite colleagues or supervisors to sit in on classes, may not
give students a chance to evaluate the course, and may otherwise, through
what I call the "politics of obfuscation," make an informed evaluation
of performance impossible. Under such circumstances, a bias towards
the average is more likely, and is clearly desirable from the point of
view of the inadequate performer. The other side of this is that
the best performers will be eager to provide such information to insure
that they are not rated as average, but as above average. Both are
Contamination CSBs, and both may take away from time or effort that might
have been expended in improving actual performance.
Among other information
processing biases that can be exploited are the primacy effect,
and the previously mentioned halo effect. If it can be arranged
such that one's best performances are made known early, the overall rating
will tend to be higher than if one's poorest performances were made known
first, even though all performances are made known prior to evaluation.
Likewise, an early positive impression based upon something other than
performance, e.g., physical attractiveness, will increase subsequent ratings
on other dimensions, including performance. I tell my students, concerning
applying for a job, to make their best qualities known first. If
they write well but are physically unattractive, they should introduce
themselves first by letter. If they have a lovely voice they might
phone. If they are physically attractive they should first appear
in person. These impression management tactics are Contamination
CSBs dependent for their effectiveness on information processing biases
of evaluators.
One other bias capable of
exploitation is the "availability heuristic," according to which
one's overall impression is shaped inordinately by relevant instances most
readily accessible to memory. One can make a successful performance
more accessible by attaching it to other memorable information, by "showcasing"
it in some dramatic fashion, or by constant reminders of its occurrence.
One instance of prior success, so highlighted, may be more persuasive than
ten unmemorable failures. Getting one's successes in the paper, "publicity,"
or otherwise through "grandstanding," making them readily available
to memory, while obscuring one's failures, all qualify as examples of Contamination
CSBs.
The importance of Contamination CSBs is that where they are perceived to be more efficient at achieving valued outcomes than are Relevant CSBs, they will lead to a reduction in desired behavior. Furthermore, widespread occurrence of most of the described behaviors in an organization is likely to have other potentially devastating effects. The lack of cooperation, the interpersonal enmity, the prevalence of cheating, stealing, and brown-nosing, the poor morale, and the job dissatisfaction often reported in educational as well as in other institutions can perhaps be attributed to the contamination CSBs likely to develop when highly valued outcomes are made contingent upon measures that are criterion contaminated. Without evidence of powerful benefits also achieved by such performance evaluation, one might conclude that these are very high prices to pay.