Journal of Consulting and Clinical Psychology © 1991 by the American Psychological Association, Inc.
April 1991 Vol. 59, No. 2, 233-244
For personal use only--not for distribution.

Treatment of Aptitude × Treatment Interactions

Bradley Smith
University of Arizona
Lee Sechrest
University of Arizona
ABSTRACT

The main effects in psychotherapy research have been smaller than expected. Rather than concluding that psychotherapy has weak effects, clinical researchers have argued that average effect sizes are reduced because of mismatches between clients and treatment. Hence, Aptitude × Treatment interaction (ATI) research has been viewed as a promising new frontier in psychotherapy research. If ATI research is to become a productive and progressive program, then researchers will need to focus their attention on interesting and meaningful ATIs. This will require greater theoretical precision and a stronger emphasis on construct validity. Specific issues addressed in this article include Type II and Type III errors, manipulation checks from both the patient and practitioner perspective, considerations of the strength of treatment, the need to test rival hypotheses, and the desirability of collaborative research.

Psychotherapy outcome research is a frustrating business. Experiments are not easy to arrange and control, outcomes are difficult to measure–often even to define–and results are often disappointing. Psychotherapy ought to work better than it appears to. One possible explanation for why it does not is that the effects of psychotherapy depend on specific characteristics of patients and the therapies to which they are exposed (i.e., effects of psychotherapy may depend on Aptitude × Treatment Interactions (ATIs; Cronbach & Snow, 1977 ). More specifically, the ATI hypothesis states that appropriate matching of patients with treatment will result in better outcomes. This hypothesis has been optimistically interpreted by many clinical researchers to mean that ATI research can uncover psychotherapy effects that, compared with main effects, are stronger and more reliable. Unfortunately for the optimists, as we will assert and try to account for in this article, compared with main effects, ATIs in psychotherapy research may be infrequent, undependable, and difficult to detect. The purpose of this article is to take a sober look at the realities and probable impact of ATI research in terms of psychotherapy theory and practice. The ATI approach is not a quick fix to the problem of disappointing results in psychotherapy research. This article outlines a variety of stringent conditions necessary for adequate ATI research. Ironically, if our recommendations are heeded, it is likely that subsequent research will uncover previously "hidden" main effects more frequently than interactions.

What Is an ATI?

To discuss Aptitude × Treatment interactions, it is necessary to agree on just what is meant by such an interaction. We do not have in mind the purely arithmetic fact that an interaction refers to a multiplicative rather than merely an additive effect of two or more variables, although people without much quantitative training often imagine that when two variables must be taken into account, that implies an interaction. Rather, we are concerned with the fact that an Aptitude × Treatment interaction may be manifest in different ways with entirely different implications. It needs to be noted, however, that demonstration of an interaction requires a minimum of four data points. It is not enough to show that Z therapy is superior to Y therapy with depressed clients in order to infer an interaction. One must show that for some other condition, say clients with anxiety, the difference between Z and Y therapy is either smaller or larger than for depressed patients. An interaction is always specific to particular contrasts. In our example, the contrast is with respect to the problem (psychopathology), but the contrast could be with respect to personal characteristics of patients (e.g., sex), characteristics of therapists (e.g., experience), circumstances of treatment (e.g., voluntary—involuntary), or site of treatment (e.g., inpatient—outpatient). Although Aptitude × Treatment interactions are typically defined as involving contrasts in patient characteristics, that limitation need not apply. The ideas and principles are the same. In essence, however, it is important to understand that the occurrence of an interaction implies a limitation on generalizability of effects of treatments. On the other hand, an interaction also implies a basis for optimism in that the treatment under study works better with some persons or under some conditions than under others.

The existence of an interaction does not necessarily mean that circumstances exist under which the two treatments are equal, or that if one is better for one thing, the other must be better for something else. Interactions can be disordinal or ordinal ( Cronbach & Snow, 1977 ). In the case of disordinal interactions (panel 1 of Figure 1 ), the lines connecting the like treatment conditions cross. This, for some reason, is often taken as firmer evidence for an interaction than a "mere" ordinal interaction in which the lines do not cross (at least not within the range of variables studied; see panel 2 of Figure 1 ). If statistical significance means anything, it surely means that a significant ordinal interaction is just as dependable as a similarly significant distortional interaction. Some of the mythical superiority of disordinal interactions may be an artifact from the use of statistical models that give priority to main effects. Partialing out main effect variance has differential effects on disordinal and ordinal interactions, and, as a consequence, ordinal interactions may simply not be statistically significant after main effect variance is removed.

Moreover, under some circumstances, apparent ordinality may simply reflect limitations on the representations of independent or dependent variables ( Cronbach and Snow, 1977 ). For example, consider the apparent ordinal interaction depicted in panel 3 of Figure 1 . For therapists with 2 years of experience, therapy modalities Y and Z do not differ in effectiveness, but for therapists with 4 years, Z is superior. But consider the dotted lines extending to the level of 0 experience, a level not tested in the experiment. 1 The depiction suggests that the ordinal interaction actually observed would have been disordinal had different levels of the experience variable been included. For inexperienced therapists, Therapy Y would be a better choice. In fact, had only 0 and 2 years of experience been studied, one would have observed quite a different ordinal interaction with Y superior at the lowest level and Y and Z equal at the highest (2-year) level.

The interaction "results" displayed in Figure 1 also permit another observation: the pernicious potential of the inclination to connect two data points by a straight line. The line drawn in panel 4 almost cries out for the interpretation, "With increasing therapist experience, Z is increasingly the intervention to be recommended." That cry should be resisted for reasons portrayed in panel 4, where it is evident that lines of almost any sort may connect two data points. Nevertheless, a straight line is the best guess when only two data points are available. Therefore, anyone who intends to model growth and change should begin with a minimum of three data points. It may be that some ATIs can be measured only by nonlinear models.

Varied Interaction Effects

Interactions may be manifest in treatment effects in different ways with quite different implications. These are worth considering with some care.

First, and most commonly, an Aptitude × Treatment interaction is taken to refer to the greater effect of a treatment in the presence of some characteristics than others. For example, systematic desensitization is relatively more effective for patients with phobias than for patients with obsessive-compulsive disorders; certain forms of behavior modification are more helpful for obsessive-compulsive disorders than for phobias. Underlying such observations is the implicit idea of some absolute effect of therapies, a qualitative difference between them. A medical illustration of this type of interaction would be the use of quinine to treat malaria versus the use of penicillin to treat a bacterial infection.

A second way in which an Aptitude × Treatment interaction might be manifested, however, is in the relative efficiencies of two treatments. Whiskey is a "stronger" drink than beer, but that simply reflects the concentration of alcohol per fluid ounce. Beer is just as intoxicating as whiskey if one consumes enough of it. Similarly, two therapies could conceivably differ not in terms of the terminal effect but in the "dose" required to get there. One needs to be cautious in interpreting findings comparing 8 sessions of Therapy Y with 8 of Therapy Z. If Y takes 16 sessions to achieve the same effect of 8 Z sessions, that might make Z a preferable therapy (e.g., in terms of cost-effectiveness; Scott & Sechrest, in press ), but that is not an absolute difference between the outcomes of the two therapies. Moreover, psychotherapeutic processes are not necessarily affected by the rate of change. Therefore, some ATIs may reflect differences in the rate of change rather than a qualitative difference in process and outcome.

A third possibility is that Aptitude × Treatment interactions may be specific to the outcome measure(s) chosen. Therapy Z may produce more improvement in self-esteem than Y for female than male patients, but no difference may be observable on any other outcome measure. An obvious implication is that researchers should use multiple outcome measures, and interpreters and users of research should be cautious in inferring the existence of generalized Aptitude × Treatment interactions without considering the possibility of an Aptitude × Treatment × Outcome interaction. It should be noted here, however, that a recent attempt to identify different outcomes for different treatments of depression failed to find any Treatment × Outcome interactions ( Imber et al., 1990 ). This failure may have occurred because, although Imber et al. did enter client aptitudes into their analyses (level of depression), the interaction may not have been significant for a variety of other reasons, such as problems with statistical power, lack of measurement precision, or the simple possibility that there was no interaction effect.

Who Will Be Affected by ATI Research?

What are the boundaries of what constitutes "treatment" for the purposes of discussing Aptitude × Treatment interactions? If we consider the entire panoply of interventions relevant to the domain of "psychotherapy," then it is manifestly absurd to think that there are no ATIs (e.g., see Lazarus, 1990 ). It is now widely conceded that behavioral intervention involving exposure to the stimulus is the treatment of choice for phobias; however, such interventions would be irrelevant at best for a wide range of other conditions. For example, it is doubtful that simply exposing depressed patients to depressing stimuli will make them any better. Matching different problems with differentially effective treatments automatically makes for an interaction. Behavioral interventions tend by their very nature to be tailored somewhat to the requirements of different problems so that their use assumes an ATI. The recommendation of breathing retraining for hyperventilation is so obvious an instance of an assumed ATI that it is not even interesting; no one would suppose that breathing retraining would be useful in treating bruxism or brutomania.

On the other hand, if psychotherapy has no specific effects, if it is all just a matter of the quality of the relationship, then the search for ATIs is similarly diminished in interest. The search would be reduced to the search for therapist—patient matching variables that would foster the development of a high quality therapeutic relationship. Requirements for matching therapists and patients are not of great practical interest because they are difficult to meet under most conditions of practice. Therapists in agencies are usually under pressure to deliver services and do not often have the luxury of declining to treat a person because of less than optimal matching. Therapists in independent practice are under similar pressures, albeit for somewhat different reasons.

We assume that the interest in ATIs reflects assumptions about strategic and tactical options potentially open to therapists (i.e., that if a therapist knew of ATIs, the therapist could capitalize on them by optimal behavior). Interest in ATIs assumes that therapists are capable of planful flexibility in their decisions about how to approach cases. If the ATI reflects differences in the effectiveness of two or more modalities of therapy (e.g., as appears to be the case for treatment of phobias), the ethical therapist has either to master multiple modalities or decline to treat a patient nonoptimally and refer the patient to a therapist competent in the better modality. In actuality, most therapists probably consider themselves generalists and try to treat almost every patient who enters their office. Thus, ATIs may prove to be more interesting in theory development than in actual practice.

A medical analogy can further elucidate the complexity of treatment decisions in the context of ATIs. A cardiologist would not attempt to treat a patient presenting with a sore knee but would refer the patient to an orthopedist (another modality). The orthopedist might conclude that the first challenge would be to reduce swelling and inflammation and might prescribe an anti-inflammatory drug, rather than a painkiller. That tactical decision would involve an ATI. The physician's overall initial strategy might be to rely on natural healing processes on the assumption that the problem was caused by a severe sprain with no critical tissue damage. That strategic approach would involve an assumed ATI. If the problem were caused by tearing of the anterior cruciate ligament, then natural healing processes would produce an unsatisfactory outcome, and surgery would be recommended. The ATI would be a Diagnosis or Problem × Treatment interaction.

The same orthopedist might be faced with another patient who could be considered for a total knee replacement. The doctor might decide that the patient's age and lifestyle would not justify such a radical procedure. With a young but not highly active patient, watchful waiting could be a better treatment choice if a few years of moderate disability were rewarded by improvements in prosthetic technology during the waiting period. That would be a Patient Characteristic × Treatment interaction. If alternative treatments for a damaged knee involved a trade-off between discomfort and reduced mobility, that would be a Treatment × Outcome interaction.

In order to exploit ATIs when they are discovered, it is essential to know exactly how the treatment works and what the mechanisms are. Any ATI is observed in a particular context of therapists, patients, problems, circumstances, and so on. If we are to know how the observed ATI is to be applied to some new set of conditions, then we must know exactly the nature of the interaction in the first place. Our current understanding of most psychotherapies and behavioral interventions is scarcely more sophisticated than would be represented by a description of a medical "treatment" as "some red pills." A pharmacotherapy is not regarded as completely satisfactory until its specific mode of action is understood (i.e., which chemical operating on what structure to produce what response). Acetyl salicylate (aspirin), for example, was a useful therapeutic for centuries, but the medical community was never completely comfortable with it until they began, recently, to understand its various modes of action.

When one considers that a century after the invention of psychotherapy major disagreements still exist about such a fundamental issue as whether there are any specific treatment effects (e.g., Lazarus, 1990 ; Strupp, 1989 ), one can believe that we are still at the early aspirin stages in our understanding. Even if ATIs were found, we would not be in a position to interpret them correctly and exploit them in other than the most direct empirical way. In general, as we will elaborate on later, we expect psychotherapy theorists to benefit more from ATI findings than will practicing therapists.

Complexities and Difficulties in ATI Research

Common sense suggests that there should be at least some Aptitude × Treatment interactions (ATI) in psychotherapy. The concept of the ATI has received serious attention in the educational psychology literature (see Cronbach & Snow, 1977 ) and has been proposed as an important, if not essential, strategy for research in all areas of psychology (e.g., Cronbach, 1975 ). Nonetheless, very little systematic work on ATIs in psychotherapy has been done, and very few replicable ATIs have been reported.

An important reason ATI research has not become a reality is the fact that measuring and interpreting interaction effects is much more difficult than dealing with main effects. ATI research requires greater precision than general effects research. Rather than comparing general packages of treatment delivered to a broad class of patients, in ATI studies one needs to know precisely what it is about the patient that interacts with a precisely defined component of treatment. Interaction effects need to be shown to occur above and beyond the additive influence of main effects, and this requires studies with large sample sizes and at least four treatment cells. Thus, compared with the search for main effects, research on interactions requires better measurement, more subjects, a wider variety of conditions, and specific a priori hypotheses.

To complicate further the problems of ATI research, traditional statistics are relatively insensitive to interaction effects ( Wahlsten, 1990 ). Some of the statistical disadvantage in interaction research arises from the tendency of scientists to consider main effects before interaction effects. This preference is arbitrary, but it supposedly promotes parsimony and has become the accepted rule (actually it may be more parsimonious to propose one interaction instead of two main effects). Some consequences of the "parsimonious approach" are these: (a) Interactions have to be gleaned from the variance left over after main effects are removed, which reduces the likelihood that interactions will be significant; (b) the magnitudes of interaction effects may often be underestimated; and (c) the magnitudes of main effects are often overestimated. We are in general agreement with Dawes's (1988) point that interactions may not be very impressive after main effects are allowed for. But researchers who think interactions are important should understand the handicap they place themselves under in using analytical models that look for interaction effects only in the residuals from main effects predictions. Analyses of variance (ANOVAs) have that unhappy property, and the search for interaction effects might better be carried out by General Linear Models involving regression methods and individual growth curve analyses.

The search for ATIs is also impeded by the inclinations of many investigators to convert continuous measures to categorical measures, thereby sacrificing critical information. That inclination has probably been contributed to by a preference for ANOVA statistics and the custom of graphing interactions by the mean values of individual cells. What Cronbach and Snow (1977) refer to as "ATI regressions" are simply plots of regressions. These have the advantage of showing the shape of the entire function (e.g., revealing nonlinearity if it exists). Interactions will ordinarily be seen as differences in slopes of regression lines, although Cronbach and Snow note that interactions may occasionally affect variances. Not only are regression analyses likely to have greater power for detection of interactions ( Cohen & Cohen, 1983 ) but they are certain to be more sensitive to other features of the phenomena under study.

Few psychotherapy studies are planned to reveal interactions, and most of them have inadequate statistical power to detect interactions, especially given the preference researchers have shown for ANOVAs. Cronbach and Snow (1977) suggest, for example, that an ATI study with subjects assigned randomly to groups should have about 100 subjects per treatment, a sample size much larger than almost any psychotherapy studies. Blanchard, Appelbaum, Radnitz, Morrill, et al. (1990) , in what is a fairly representative example of therapy research, had only 116 cases to allocate to four treatment conditions. It is not surprising that they found no differences among the three active treatments, although two actually had an odds ratio of better than 1.7 for producing improvement when compared with the third. Another study of treatment of a small sample of elderly patients found no differences between treatments ( Thompson, Gallagher, & Breckenridge, 1987 ), although at the end of treatment "major depression still present" was twice as frequent in the cognitive as compared with the behavior therapy group. The possibility in such studies for detecting an interaction indicating that one of the treatments would be better than the others with a particular type of patient is virtually nil.

Unfortunately, power analyses are rarely reported for psychotherapy studies of any kind, an omission that is going to have to be corrected, but that is going to prove painful. Journal editors must begin to insist on properly done power analyses. Power analyses must be done before therapy studies are undertaken if they are to be useful. After the fact analyses permit capitalization on chance to a considerable, although unknown, degree because all the estimates must necessarily be considered biased.

With the exception of the above-mentioned statistical considerations and the minimum requirement that two treatments and two aptitudes must be compared, the difficulties of ATI research are basically similar to those faced in all types of psychotherapy research. Many of these issues are discussed else-where in great detail; however, it strengthens the purpose of this article to remind ATI researchers not to perpetuate problems that continue to plague psychotherapy research. As mentioned earlier, simply looking for ATIs will not automatically result in psychotherapy research successes. On the contrary, the ATI approach may exacerbate past problems that, if left uncorrected, could lead to dismal failure. Some of the more troublesome of these research liabilities are discussed later in this article.

Do ATIs Really Exist?

Before discussing possible methodological problems that have made ATIs elusive, it is important to consider the possibility that ATIs do not exist, or at least that they are rare. Virtually every comprehensive analysis of psychotherapy outcomes (e.g., Smith, Glass, & Miller, 1980 ; Landman & Dawes, 1982 ; Luborsky, Singer, & Luborsky, 1975 ) ultimately concluded that type of therapy, experience of the therapist, credentials of the therapist, and so on are unrelated to outcome. If any client variables are consistently related to outcome of therapy, they generally support no more of a conclusion than that clients who are bright, verbal, motivated, and not so bad off in the first place tend to do better in therapy. Those are main effects, rather than interactions, and not very interesting ones. These conclusions have not, however, been much of a deterrent to speculation about ATIs. For example, an ATI reported by Jacobson, Follette, and Pagel (1986) for marital therapy was not only not particularly large in size but it was found only at immediate posttherapy measurement and not at follow-up. Talley, Strupp, and Morey (1990) found an interaction between therapists' tendencies toward affiliative or hostile response and patients' similar tendencies but only for therapist-rated improvement on a single item scale. Other interactions were reported, but they were similarly inconsistent across independent and dependent variables.

Despite the fairly consistently negative outcomes of the search for ATIs, the search is unabated. Why? One answer may lie in what Dawes (1979) calls "arguing from a vacuum." People have a strong tendency to believe and argue that if one desired solution does not work, it must be true that something else will. Psychotherapists would like to believe that therapist experience has a good bit to do with outcome, and when that cannot be shown, the response is to believe that the answer must lie in interactions. If type of therapy does not appear to be related to outcome of therapy, then the effect of type of therapy must lie in an interaction with other variables. All that remains is to ferret out and display the interaction.

In some important ways, the persistence of psychotherapy researchers in searching for ATIs resembles the error in thinking that Dawes (1988) identifies with the commitment to "sunk costs." So much effort has been expended in the attempt to find the "silver bullet" of psychotherapy that it is simply too painful to abandon all that investment, cut losses, and try something else. As the punch line of an old joke has it, "There must be a pony in there somewhere!"

To a metascientist the movement toward ATI research might be viewed as a symptom of a degenerating program of research. Programs can be said to be degenerating if they (a) fail to yield new predictions or empirical successes and/or (b) deal with empirical anomalies through ad hoc maneuvers that overcomplicate rather than clarify the problem of interest ( Gholson & Barker, 1985 ). Perhaps psychotherapy researchers should be seriously and dispassionately reconsidering the core assumptions of their theories rather than building an elaborate ATI structure on a crumbling theoretical foundation.

The disappointment over empirical outcomes notwithstanding, the case for interactions is, unfortunately, a priori discouraging. Here is why, as Dawes (1991) makes clear. If there is an interaction but no main effect for the treatment variable, then either the interaction must be very small in magnitude in relation to unexplained (error) variance or the interaction must be disordinal (as in panel 1, Figure 1 ). Now if an interaction involving, for example, therapist experience were disordinal, that would mean that experienced therapists were less helpful, and perhaps harmful, to some clients. Not only would such a conclusion seem unlikely on the face of it, but it would immediately plunge the field into serious ethical difficulties. It would clearly be unethical to assign a client to any form of intervention known to be suboptimal. Thus, many ATIs, if they exist at all, may be more confusing and troublesome than past failures to find strong main effects.

Type III Errors in Psychotherapy Research

Failures of validity of experiments can lead to three types of erroneous conclusions: (a) The treatment is judged to be effective when it is not (i.e., a Type I error); (b) the treatment is judged to be ineffective when it actually is effective (i.e., a Type II error); or (c) researchers conduct the wrong experiment (i.e., a Type III error). Type III errors occur when faulty measurements, experimental designs, or conceptualization of crucial variables prohibit meaningful interpretation of experimental results. In the case of ATI studies, failures to understand aptitudes or treatments result in Type III errors.

Historically, the possibility of making Type I errors has been more carefully guarded against than the chance of making Type II errors. Meanwhile, the possibility of making Type III errors has been virtually ignored. This oversight is serious because Type III errors override both Type I and Type II errors. Who cares if a hypothesis is erroneously accepted or rejected if the hypothesis is misspecified to begin with? Therefore, to minimize and control for Type I and Type II errors at the expense of Type III errors is a fallacy of misplaced precision ( Mitroff & Featheringham, 1974 ). Most of the issues relevant to controlling and minimizing Type III errors fall under the province of what Mahoney (1978) calls "theoretical validity," which he defines as the extent to which an experiment has some logical bearing on a specific hypothesis or theory. Unfortunately, owing to the proliferation of diverse paradigms for explaining and studying human psychopathology, it is difficult to reach agreement on the minimum standards for theoretical validity. Nevertheless, some aspects of theoretical validity may be less debatable than others. For instance, determining whether a hypothesis is clearly stated a priori is much less problematic than trying to decide whether the hypothesis is theoretically relevant to a certain paradigm. Thus, even though many issues regarding theoretical validity are entrenched in specific schools of psychotherapy, it should be possible to list several methodological issues related to theoretical validity that are ubiquitous across paradigms of psychotherapy.

Concerns over Type III errors pertain to all four of the most widely recognized types of experimental validity: internal, external, construct, and statistical conclusion validity ( Cook & Campbell, 1979 ). Of these four, statistical conclusion validity is probably the least affected by Type III errors. Nevertheless, calculations of statistical power may be dependent on theorybased estimates of treatment effect size ( Scott & Sechrest, 1989 ). When this is the case and the wrong theory is applied or the right theory is misapplied, then a Type III error can result in a problem with estimating statistical power. Thus, statistical conclusion validity might be affected by Type III errors, although threats to statistical conclusion validity do not appear to contribute to Type III errors. Nonetheless, it should be emphasized that statistical conclusion validity issues are, on other grounds, critical to study and interpretation of ATIs.

Proper concern with Type III errors increases as attention shifts to internal validity. The confidence with which one can assert that the outcome of an experiment is attributable to the intervention and to no other variables (i.e., internal validity) is not a purely objective deduction. Formulations of experimental problems, and as a result internal validity, depend to a large extent on the researcher's ability to conceive of and control for rival explanations of treatment effects. Determining whether an effect occurred is a relatively simple endeavor compared with the process of attributing the effect to the independent variable. It is easier to show that a treatment works than to explain how it works, and misattributing a cause to an effect is a Type III error. For example, Jacobson, Follette, and Pagel (1986) may have shown that behavioral marital therapy is more beneficial for egalitarian couples than for others (although only immediately posttherapy), but that finding is not necessarily easily explained; and simply stating that the outcome was some ATI related to the "egalitarian" qualities of the couple would be a mistake. In order to reduce problems with Type III errors, we need to know more about specific and unique qualities of concepts such as "egalitarian" and "behavioral marital therapy."

Problems associated with misunderstanding how a treatment works are especially distressing when one attempts to disseminate findings from an effective study, a shortcoming recognized by Jacobson et al. (1986) . The legitimacy of various generalizations of research findings across persons, places, and times (and other dimensions; see Cook, 1990 ) is the essence of external validity. Even though conclusive evaluation of external validity is based on empirical demonstrations of the replicability of treatment across different settings and subpopulations, most inferences about generalizability are based on theoretical interpretations of treatment. As a result, poor understanding of treatment variables can be expected to result in flawed generalization of treatment.

Out of Cook and Campbell's (1979) list of validities, construct validity is the most closely linked with Type III errors. Construct validity is completely subsumed by the concept of theoretical validity, although a few considerations pertinent to theoretical validity are not traditionally associated with construct validity (e.g., the issue of clinically vs. statistically significant change). We will not attempt to make sharp distinctions between theoretical and construct validity, and these terms may be used interchangeably, as they are in this article. However, we favor the expression "theoretical validity" because it emphasizes the importance of theory in psychotherapy research and promotes the notion that researchers should occasionally look beyond the four types of validity listed by Cook and Campbell (1979) .

Theoretical validity refers to the adequacy of our understanding of the experimental variable(s) being studied. If the variables are not operationally defined in a manner that clearly and completely represents the theoretical construct of interest, then the experiment can result in a Type III error. Likewise, if the variables are not accurately described (i.e., poorly reported), there is a risk that subsequent readers and researchers will commit Type III errors. In summary, any severe threat to construct or theoretical validity will almost certainly result in a Type III error, and these errors imply that the worth of the entire experiment is diminished, at least in terms of theoretical or practical meaning. Unfortunately, the psychotherapy literature is replete with instances of poor, and probably inaccurate, description of interventions. Type III errors (i.e., failing to understand what treatment really is) could be the primary reason why main effects research has been disappointing. If these errors are left uncorrected, Type III errors could be an even more formidable barrier to ATI research.

A final note on the theoretical validity of treatment concerns the importance of the timing of formulations of theoretical explanations. Hypotheses that are formulated before the experiment (i.e., a priori hypotheses) are presumably tested by the experiment. Hypotheses generated after inspection of the data (i.e., a posteriori hypotheses) are actually untested and may represent little more than speculation. Thus, accepting a posteriori hypotheses as fact can result in Type III errors. An example is provided by the National Institute of Mental Health (NIMH) Depression Collaborative Research Program, for which a recent analysis ( Elkin et al., 1989 ) appeared to show a reasonable, but unanticipated, ATI. More severely depressed patients got greater benefit from drug treatment than from other treatments, although no greater benefit from drugs existed for less severely depressed patients. This ordinal ATI would have been considerably more persuasive had it been predicted in advance, because compared with hindsight, a priori prediction suggests a greater understanding of the topic of interest.

Surmounting the Barriers to Adequate ATI Research

This article has described four major barriers to measuring ATI effects: (a) the need for relatively more complex designs and the ensuing practical and economic difficulties; (b) conditions that promote Type II errors, including small sample size and inappropriate statistical techniques for detecting interactions; (c) Type III errors, especially failing to understand the exact nature of treatments; and (d) the very real possibility that ATIs are infrequent and undependable. These barriers are formidable but not entirely insurmountable. The following section offers suggestions for overcoming adversity in ATI research.

Collaboration and More Appropriate Statistical Models

The requirement that experiments have sufficient designs and degrees of freedom to detect interactions is, obviously, not easy to meet, but ignoring that requirement has not gotten us, and will not get us, anywhere either. It will not do the field of psychotherapy any good simply to bemoan the fact that large sample sizes and elaborate designs are required for ATI research and then to ignore that fact in practice.

One possibility is that more studies can be carried out collaboratively. Successful completion of collaborative studies is not easy, but it is wasteful of money and effort to do studies that will not accomplish their aims. It should be recognized, however, that power to detect effects is not solely a matter of sample size (see also Higgenbotham, West, & Forsyth, 1988 ). It may be that for some effects the reduction of alpha from .05 to .10 would be justified; too much emphasis is placed on statistical significance in any case ( Cohen, 1990 ). A second determinant of statistical power is the effect size anticipated, which may depend heavily on the strength of the intervention. Strength of intervention is often under the control of the investigative team, and treatment should be planned to be strong to begin with, and lapses in integrity of treatment should be protected against ( Sechrest & Redner, 1979 ). Investigators should also, if they value and predict interactions, think of giving them priority over main effects in their statistical models. Finally, statistical power is a function in part of the size of the error term. The magnitude of experimental error is also very often under control of the investigative team. Experimental error can be reduced by decreasing heterogeneity in the sample, by better measurement procedures, by greater precision in conducting the experiment, and other maneuvers ( Sechrest & Yeaton, 1981b ).

Theory Driven Research

We believe strongly that if psychotherapeutic interventions are going to be improved substantially, and particularly if that improvement is to be derived from ATIs, better theory is going to be required, and that theory will have to have its basis in fundamental psychological research. Twenty-five years ago, Goldstein, Heller, and Sechrest (1966) proposed that psychotherapy should be, first and foremost, psychological, and that meant grounding theory and practice in the basic concepts and findings of the field. They did not believe that much progress could be made by trying to develop what would be a separate, isolated discipline of psychotherapy. Moreover, the three authors demonstrated by systematic reviews of research literature that hypotheses important to the psychotherapeutic enterprise could be derived from more basic research in the field. Despite a good bit of assent from others at the time, even some acclaim, the main thesis of Goldstein, Heller, and Sechrest has been, we think, largely ignored. Some of their ideas have been realized to some extent (e.g., the importance of generalization as an aspect of psychotherapy), but for the most part current literature on psychotherapy appears to pay scant attention to research on basic behavioral and conceptual processes. One cannot determine that, for example, cognitive therapy owes any more than the most general debts to cognitive psychology. If cognitive therapy is to develop and improve, one would think that it ought to take account of what cognitive psychologists are learning about cognition. A recent volume, Psychotherapy and Behavior Change ( Higgenbotham et al., 1988 ), which is a descendent of Goldstein, Heller, and Sechrest, provides contemporary instances of the need to apply more basic research in developing thinking and research about psychotherapy. We would observe, however, that the material reviewed and the hypotheses developed tend strongly to suggest that any major improvements in psychotherapy are more likely to be in the form of main effects than ATIs.

Multitrait—Multimethod Experimental Designs

When testing psychological theories, researchers need to determine if their operationalization of the theory is accurate and representative of the problem of interest. Theoretical validity problems need to be subjected to the rigorous methodology of the multitrait—multimethod approach ( Campbell & Fiske, 1959 ), which has been largely ignored in psychotherapy research. This approach attempts to get at the true meaning of a measure (the construct "true score") and, at the same time, to identify and attempt to neutralize the effect of any systematic bias introduced by measurement and analytic strategies. Thanks to modern statistical procedures such as structural modeling researchers are now in the position to undertake theory-driven determinations of convergent and divergent validity of experimental measures.

Manipulation Checks

Once the underlying traits of experimental measures are determined, dependent variables (DVs) can be chosen that should reveal the operation of the theorized mechanisms of change. For example, if a treatment for drug users focuses on self-concept because improved self-esteem is thought to result in better recovery, then treatment evaluation should use measures of self-esteem. Assuming that the DVs are perfectly psychometrically sound (which is rarely the case), if no reliable differences are recorded for these DVs, then the construct validity of the treatment should be questioned. In the drug abuser self-esteem example, failure to observe changes in self-esteem threatens the construct validity of treatment, at least to the extent that self-esteem is critical to the overall treatment theory. The methodology of recording changes in hypothesized mediating variables has been called "manipulation checks" ( Aronson & Carlsmith, 1968 ). In our opinion, if the psychotherapy research paper does not report manipulation checks, then the reader should be skeptical about the construct validity of treatment.

Manipulation checks have also been shown to be useful in studying the believability of alternative treatments, especially those intended as placebos ( Dush, 1987 ). If, as is usually assumed, placebos have their effect through arousal of expectancies for change, if they do not arouse those expectancies, they are not good placebos. Manipulation checks can be used to determine whether alternative treatments arouse similar expectancies, are equally believable, and so on. Alternatively, manipulation checks may reveal that alternative, nonspecific treatments were more "active" than they were intended to be. An example is provided by Blanchard, Appelbaum, Radnitz, Michultka, et al. (1990) , who found that the placebo condition probably functioned much like a relaxation treatment. Elkin et al. (1989) also noted that the placebo condition in the National Depression Collaborative Research Program was not inactive.

We also think that manipulation checks should be more generally used to assess the nature of therapy as perceived by patients. For example, the National Depression Collaborative Research Program intended to compare the effects of interpersonal and cognitive behavioral therapies, for which extensive manuals were prepared in order to achieve uniformity in implementation of treatment. We are not, however, aware of any attempts to determine whether patients experienced the two therapies in any different ways (e.g., whether patients in cognitive behavior therapy viewed their therapists or therapy in any way different from the views of interpersonal therapy patients). This type of information is crucial to ATI research because patient aptitudes may have a major effect on perception of treatment and subsequent outcome. Unfortunately, with the exception of detailed process analyses on the role of patient expectancy in psychotherapy (e.g., Elliott, 1986 ), there appears to be very little information about the phenomenology of patienthood.

Dawes (1991) also makes the point that the fallback position of asserting that it is the importance of "the relationship" or the "therapeutic alliance" is weak unless we have measures of the relationship or alliance that predict outcome. Even then, the causal connection of the quality of the therapeutic alliance to outcome may be uncertain. The therapeutic alliance is likely to grow stronger when things are going well, and the perception of improvement (e.g., in symptoms) might well lag behind the perception of the quality of the alliance without any necessary inference of a causal relationship. In recovering from medical illnesses, people often report feeling much better or even "well" long before their relevant biomedical parameters have shown much change.

Strength of Treatment

A critical parameter in the understanding of any intervention, including psychotherapy, is the strength of the treatment. Strength is viewed here as a close analog to the strength of a drug treatment. A strong treatment of aspirin would be 10 mg; a weak treatment would be 2 mg. What would "strong" psychotherapy look like? We do not know. We can, however, conjecture as follows, allowing our description to reflect rather more common sense and consensus than actual empirical evidence. Strong psychotherapy might include

  • Doctoral trained therapist;

  • Therapist with at least 10 years experience;

  • Therapist specialized in type of problem involved;

  • Therapist well-versed in empirical and theoretical literature;

  • Therapist highly regarded by peers for professional expertise;

  • Well-developed therapy protocol (manual) for specific problem;

  • Two sessions per week in the beginning, one per week thereafter;

  • Intense sessions with minimum wasted time;

  • Specific recommendations for intersession "practice" activities;

  • Therapy of at least one year duration if required;

  • Therapist is accountable for integrity of treatment and for its outcome.

    Obviously, the characteristics listed are not likely to be orthogonal in real life (e.g., a therapist who is highly regarded is likely to have a doctorate and to have a good bit of experience). If all these characteristics were, however, descriptive of the therapy being provided to a client, we could probably all agree that the treatment should be regarded as strong. Conversely, a minimally trained, inexperienced therapist doing short-term, unfocused treatment not guided by protocol and at a low level of intensity for only a few weeks would be regarded as providing weak treatment or, at the very least, poorly understood treatment. The major problem is to understand just how strong and weak the two implementations would be and just where in between the two any other real-life instance of therapy might lie. For example, what about a new doctoral-level therapist a few months out of internship but very well-read, following a protocol, one session per week for 12 weeks? We are not even close to being able to estimate strengths of various psychotherapeutic interventions, but that is not because the task is impossibly difficult.

    Treatment strength can be quantified. One could, for example, assemble panels of experts and ask them to assign weights to different aspects of therapeutic interventions. In a study of rehabilitative efforts directed at criminal offenders, for example, Sechrest and West (1983) found that professional training beyond the master's level was not accorded any additional weight. On the other hand, time in treatment was weighted in a virtually linear fashion. Alternatively, one could, perhaps by means of magnitude estimation techniques, ask experts to imagine "ideal" treatment and no treatment and then to assign a globally descriptive number to a description such as one of the above. We have regularly used a class exercise involving such judgments of smoking cessation interventions and have found graduate students quite sensitive to the methods and in generally good agreement in their judgments. Knowing the strength of treatments is crucial in ATI research because comparing weak with strong treatments is not likely to produce a very meaningful interaction effect (unless, of course, the "real-life" strengths of treatments are represented, and this also requires understanding the strength of treatment).

    Manualized Treatment

    A move toward "manualized" treatment appears to be developing, fostered by the examples used in the National Depression Collaborative Research Program ( Elkin, Parloff, Hadley, & Autry, 1985 ). Brief focused treatment used for high utilizers of health services is another example ( Cummings, Dorken, Pallak, & Henke, 1989 ). Manualized treatments (i.e., those with formal protocols) may well turn out to be stronger treatments. A study of chemotherapy for cancer patients done in Finland ( Karjalainen & Palva, 1989 ), for example, showed that patients treated by a protocol had better outcomes than patients treated according to the clinical judgments of their physicians. Holding therapists accountable (e.g., by close supervision) may also strengthen treatment. Noting that their analyses suggested weaker treatment effects than in another similar study, Elkin et al. (1989) suggest that the results may be attributable to the fact that therapists in the other study ( Rush, Beck, Kovacs, & Hollon, 1977 ) were more closely supervised.

    Study a Broader Range of Variability

    The size of any effect, whether measured in terms of mean differences produced or variance accounted for, depends in part on the strength of the intervention. In the case of psychotherapy, strength would be adjusted in terms of the "amount" or size of the "dose" of therapy. In the case of variables not directly manipulated, the strength of the intervention would be realized by the range of values over which the variables were studied. Thus, for example, if one wanted to determine the effect of therapist experience on outcome, one could include values of therapist experience ranging from no experience, to, say, 30 years. In fact, the first Vanderbilt Psychotherapy Research project ( Strupp & Hadley, 1979 ) included untrained (no experience) therapists and others with an average of 23 years of experience. If one wanted to determine the relationship between initial level of depression and outcome of treatment, one could include clients ranging from those with very mild depression (or maybe no depression) to those with depression so severe that they require round-the-clock care. Amount of therapy could range from zero to at least hundreds of sessions. A type of therapy could be varied from just barely adherent to principles of a certain treatment approach to extraordinarily adherent to those principles.

    Our impression is that, over the large body of studies that exist, many variables have been tested at fairly extreme values, but most individual studies have included only a fairly narrow range of values. Moreover, few studies have used extreme values of more than one variable, so that sensitivity to interactions has probably been limited. To wit, Wahlsten (1990) argues that reliable Heredity × Environment interactions have not been found in humans because only a narrow range of human environments have been studied. Although one may argue that studies should have representative designs, which might limit values of most variables to medium ranges, that argument may not be so strong if one believes that the main advantage of the search for interactions is the light it sheds on theoretical processes (see Shoham-Salomon & Hannah, 1991 ).

    Using strong inference and testing alternative models.

    For many years, research in psychotherapy was dominated by a strategy of opposing one therapy against one or more control groups chosen–or designed–to weaken or rule out some artifactual explanation for any therapeutic effect that might be found. Those artifacts to be ruled out often included, of course, nonspecific "ingredients" of treatment. If a therapeutic intervention proved no more effective than, for example, an intervention thought to do no more than arouse expectancies for improvement, then any appeal to a specific therapeutic effect was superfluous. That research strategy was quite consistent with the Popperian ( Popper, 1959 ) epistemology emphasizing falsification of plausible rival hypotheses. However, more contemporary metascientists, especially Lakatos and Laudan, convincingly argue that different research traditions are not incommensurable and can be tested within the same experiment ( Gholson & Barker, 1985 ).

    At about the time psychotherapy had its beginnings, Chamberlin (1895/1965) , a geologist, was urging the "method of multiple working hypotheses," insisting that progress in science would be faster if experimentation involved pitting alternative explanations against each other. Testing the effects of one variable against nothing is not at all efficient, nor is it very interesting. Many more ways exist for an idea to be wrong than right ( Dawes, 1988 ), and showing that a hypothesis is not wrong (i.e., it is better than nothing) does not mean that the hypothesis is correct. Perhaps it is not even strengthened much by being found not wrong. On the other hand, if two rival explanations are pitted against each other, differing in some crucial respects, a result favoring one over the other is highly satisfying, even if not to be taken as proof. Cronbach and Snow (1977) point to the importance of selecting theoretically important variables from among the panoply imaginable and putting them to competitive tests, a process closely akin to what Platt (1964) called "strong inference."

    Perhaps the time has come to abandon groups included in psychotherapy studies solely for the purpose of testing for nonspecific treatment effects. In fact, it has been suggested that strategies involving pitting of alternative therapies against each other should be adopted ( Dance & Neufeld, 1988 ; Garfield, 1990 ). The need for "no treatment" controls may have diminished, or disappeared altogether, with the introduction of "norms" that can now be derived from the work of meta-analysts ( Sechrest & Yeaton, 1981a ). If, on the average, therapeutic interventions can be estimated to have an effect amounting to 0.6 standard deviations improvement on a relevant dependent measure, then one does not need a no-treatment group to determine whether that effect is achieved by a therapy being tested. If a standard therapy against which a new therapy is to be tested is known to have generally larger effects than those produced by nonspecific treatment groups, then one does not need to introduce such groups into every therapy study. Control groups do consume resources that might better be spent on improvements in comparisons between studies. A strong conceptual feature of the NIMH Treatment of Depression Collaborative Research Program ( Elkin et al., 1985 ) was the use of imipramine plus clinical management as a "standard referent treatment."

    The purposes of pitting therapies against each other would be broader than simply determining which might be most effective. Ideally, therapies should be pitted against each other because they involve different assumptions about mechanisms of action, likely outcomes, and so on.

    Make Sure the ATI Is Meaningful

    Last, but not least, if researchers are going to test ATI hypotheses, they need to justify the expense of the ATI research effort in terms of the meaningfulness of the interaction effect. If an interaction is ordinal (panel 2, Figure 1 ), the mean of clients in one condition would be higher overall than the mean of clients in the other condition. That is, there should be a main effect for that condition, although the main effect might be altogether spurious in the sense of being solely attributable to the interaction. If the interaction were strong, both the main effect and the interaction should be apparent. In any case, in a meta-analysis across a population of studies, one would think that an important interaction would be manifest in a consistent tendency for one condition, say experienced therapist, to have better outcomes than another (e.g., inexperienced therapist). Dawes (1988) also shows that ordinal interactions are usually very well estimated by separate linear effects so that even when they occur, they provide little improvement over estimates made by additive combinations of the variables involved (see Figure 2 ). This point is illustrated by the results of Kadden, Cooney, Getter, & Litt (1989) in which matching alcoholics with treatments improved R 2 from .10 to .16. Even though this ATI improved prediction by 60%, overall it resulted in an improvement of only 6% (see Pickering, 1983 ). A 6% gain is probably not enough to justify the considerable amount of effort required to effect differential assignment. Even from a purely theoretical perspective, it is questionable that such a small increase in explanatory power should be viewed as a particularly important finding.

    Dawes's conclusions about the extent to which ordinal interactions may be approximated by additive linear effects are often obscured by the ways in which effects are graphed. Slopes can be made to appear flat or steep depending on the scales chosen for the ordinates and abscissae. One needs to determine the goodness of fit of the additive model by statistical, not visual, analyses.

    An additional problem of interpretation is created by the fact that the metrics in which outcome measures are expressed often lack direct meaning ( Cronbach & Snow, 1977 ; Sechrest & Yeaton, 1981c ). What exactly does it mean when, according to their therapists, some clients are -.5 and others .3 on a standardized residual of global outcome rating ( Talley, Strupp, & Morey, 1990 )? (We do not mean to devalue this or any other studies cited but use them only as examples; the problem is ubiquitous.) The problem of difficult-to-interpret metrics is, of course, not peculiar to analyzing interactions, but it is more troublesome in ATI than in main effect research. A main effect of .6 merely suggests that all cases should be treated in a uniform way, which, unless the costs of doing so are large, poses no problem. An interaction requires, however, that treatment be differential, which means that differential classification must be carried out, that two or more forms of treatment be available, and so on. Consequently, the question of whether an effect size is large enough in some absolute and practically meaningful sense to justify accepting the implications of an interaction is potentially the most important question in ATI research.

    Conclusions

    We do not believe that it is fruitful to continue to try to "discover" ATIs in psychotherapy. In this discovery mode of research, therapeutic interventions are studied and, incidentally, a long list of other variables are measured. Then, especially in the context of failure to find anticipated main effects, a search for interactions is doomed to failure. Few ATIs are ever found, and those that are found prove to be trivial, ephemeral, or both.

    We believe that if important ATIs are to be identified, it will be through deliberate tests of theoretically driven a priori hypotheses. First, ATI hypotheses must be justified in advance, just as main effects must be. No one would advocate trying interventions at random to see whether it might be possible to find one that works. Similarly, we do not think it worth looking randomly for ATIs. Second, we believe that a priori analysis of the practical or theoretical import of ATIs should be considered. If one has a notion that an ATI might have some practical value, then it ought to be possible to specify in advance just how and under what circumstances it might have value. One should also be able to say in advance just how the verification of an ATI intended to advance theory would actually advance it. Third, the treatment, or intervention, should be developed in such a way as to have a high likelihood of inducing the interaction and at sufficient strength to make it likely that the interaction could be detected if it is operative. Fourth, experiments must be of sufficient size to permit study of interactions. This means that power calculations need to be done for interactions rather than merely for main effects. Most therapy studies are not large enough to have reasonable power to detect interactions. But power calculations are not often done in any case. They must be done, and they must be done before the fact. Finally, statistical analyses should be appropriate to the problem posed and the data collected. The inclination to convert continuous variables into categorical variables must be abandoned (e.g., see Cronbach & Snow, 1977 ), and journal editors should begin enforcing that ban immediately. Much more attention needs to be paid to the possibility, indeed probability, that a large proportion of the relationships we are interested in are not linear, and some may not even be close to linear. We need not worry overly much about modest curvilinearity, but some variables are probably sharply curvilinear (e.g., asymptotic), and some may even be nonmonotonic. It is possible, as an instance, that the relationship between the amount of training for therapy and treatment outcome could be asymptotic or parabolic (i.e., an inverted U-shaped function). It is quite plausible that a moderate amount of training could produce maximum effects and, as a result, experience is not expected to be linearly related to therapeutic success.

    The above discussion of therapist experience offers an example of how inaccuracy in measurement can lead to Type III errors. Therapist experience is a relatively easy-to-collect pseudomeasure of something researchers are truly interested in, that is, therapist competence. The lacunae in psychotherapy outcome research concerning the role of therapist competence is rather peculiar (see also Schaffer, 1982 ). To the best of our knowledge, therapist competence has never been directly assessed and studied in relation to therapy process or outcome. Inadequate proxies such as therapist training and years of experience have been used, and their observed effect seems to be nil. Competence needs to be assessed in some other, more meaningful way. Indeed, most constructs of interest to psychotherapists are measured very inaccurately. Until we set higher standards for the reliability and validity of our measures of aptitudes and treatments (As and Ts) it is very unlikely that we will find any ATIs.

    We want to end by noting that, as in all scientific enterprises, knowledge about psychotherapy must advance by increments, usually small ones. Neither for main effects nor for ATIs are answers going to come from single studies. The task of generalizing from extant research is one whose difficulty has been greatly underestimated, as a reading of Cook (1990) will make clear. If we are to advance in our understanding, it will have to be on the basis of extensive research and broad wisdom and intelligence about it. "Extrapolation and broad interpretation are guided by theoretical understanding, based on intelligent consideration of findings from the whole corpus of research" ( Cronbach & Snow, 1977 , p. 22). We agree.

    References


    Aronson, E. & Carlsmith, J. M. (1968). Experimentation in social psychology.(In G. Lindzey & E. Carlsmith (Eds.), Handbook of social psychology (Vol. 2, 2nd ed.pp. 1—79). Reading, MA: Addision-Wesley.)
    Blanchard, E. B., Appelbaum, K. A., Radnitz, C. L., Michultka, D., Morrill, B., Kirsch, C., Hillhouse, J., Evans, D. D., Guarnieri, P., Attanasio, V., Andrasik, F., Jaccard, J. & Dentinger, M. P. (1990). Placebo-controlled evaluation of abbreviated progressive muscle relaxation and relaxation combined with cognitive therapy in the treatment of tension headache. Journal of Consulting and Clinical Psychology, 58, 210-215.
    Blanchard, E. B., Appelbaum, K. A., Radnitz, C. L., Morrill, B., Michultka, D., Kirsch, C., Guarnieri, P., Hillhouse, J., Evans, D. D., Jaccard, J. & Barron, K. D. (1990). A controlled evaluation of thermal biofeedback and thermal biofeedback combined with cognitive therapy in treatment of vascular headache. Journal of Consulting and Clinical Psychology, 58, 216-224.
    Campbell, D. T. & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait—multimethod matrix. Psychological Bulletin, 56, 81-105.
    Chamberlin, T. (1965). The method of multiple working hypotheses. Science, 148, (Originally published 1895) 754-759.
    Cohen, J. (1990). Some things I have learned. American Psychologist, 45, 1304-1312.
    Cohen, J. & Cohen, P. (1983). Applied multiple regression-correlation: Analysis for the behavioral sciences. (Hillsdale, NJ: Erlbaum)
    Cook, T. D. (1990). The generalization of causal connections: Multiple theories in search of clear practice.(In L. Sechrest, E. Perrin, & J. Bunker (Eds.), Research methodology: Strengthening causal interpretations of nonexperimental data (pp. 9—31). Rockville, MD: Agency for Health Care Policy and Research.)
    Cook, T. D. & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis for field settings. (Boston: Houghton-Mifflin)
    Cronbach, L. J. (1975). Beyond the two disciplines of scientific psychology. American Psychologist, 30, 116-126.
    Cronbach, L. J. & Snow, R. E. (1977). Aptitudes and instructional methods: a handbook for research on interactions. (New York: Irvington)
    Cummings, N. A., Dorken, H., Pallak, M. S. & Henke, C. (1989). The impact of psychological intervention on health care utilization and costs: the Hawaii Medicaid Project. (Unpublished Final Project Report No. 11-C-98344/9)
    Dance, K. A. & Neufeld, R. W. (1988). Aptitude-treatment interaction research in clinical settings: A review of attempts to dispell the "patient uniformity" myth. Psychological Bulletin, 104, 192-213.
    Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American Psychologist, 34, 571-582.
    Dawes, R. M. (1988). Rational choice in an uncertain world. (New York: Harcourt Brace Jovanovich)
    Dawes, R. M. (1991). Professional practice versus knowledge in psychology. (Manuscript in preparation)
    Dush, D. M. (1987). The placebo in psychosocial outcome evaluations. Evaluation and the Health Professions, 9, 421-438.
    Elkin, I., Parloff, M., Hadley, S. & Autry, J. (1985). NIMH Treatment of Depression Collaborative Research Program: Background and research plan. Archives of General Psychiatry, 42, 305-316.
    Elkin, I., Shea, M. T., Watkins, J. T., Imber, S. D., Sotsky, S. M., Collins, J. F., Glass, D. R., Pilkonis, P. A., Leber, W.R., Docherty, J. P., Fiester, S. J. & Parloff, M. B. (1989). National Institute of Mental Health Treatment of Depression Collaborative Research Program: General effectiveness of treatments. Archives of General Psychiatry, 46, 971-982.
    Elliot, R. (1986). Interpersonal process recall as a psychotherapy process research method.(In L. S. Greenberg & W.M. Pinsoff (Eds.), The psychotherapeutic process: A research handbook. New York: Guilford Press.)
    Garfield, S. L. (1990). Issues and methods in psychotherapy process research. Journal of Consulting and Clinical Psychology, 58, 273-280.
    Gholson, B. & Barker, P. (1985). Kuhn, Lakatos, and Laudan: Applications in the history of physics and psychology. American Psychologist, 40, 755-769.
    Goldstein, A. P., Heller, K. H. & Sechrest, L. B. (1966). Psychotherapy and the psychology of behavior change. (New York: Wiley)
    Higgenbotham, H. N., West, S. G. & Forsyth, D. R. (1988). Psychotherapy and behavior change. (New York: Pergamon Press)
    Imber, S. D., Pilkonis, P. A., Stotsky, S. M., Elkin, I., Watkins, J. T., Collins, J. F., Shea, M. T., Leber, W. R. & Glass, D. R. (1990). Modespecific effects among three treatments for depression. Journal of Consulting and Clinical Psychology, 58, 352-359.
    Jacobson, N. S., Follette, W.C. & Pagel, M. (1986). Predicting who will benefit from behavioral marital therapy. Journal of Consulting and Clinical Psychology, 54, 518-522.
    Kadden, R. M., Cooney, N. L., Getter, H. & Litt, M. D. (1989). Matching alcoholics to coping skills or intractional therapies: Posttreatment results. Journal of Consulting and Clinical Psychology, 57, 698-704.
    Karjalainen, S. & Palva, I. (1989). Do treatment protocols improve end results? A study of survival of patients with multiple myeloma in Finland. British Medical Journal, 299, 1069-1072.
    Landman, T. J. & Dawes, R. M. (1982). Psychotherapy outcome: Smith and Glass's conclusions stand up. American Psychologist, 37, 504-516.
    Lazarus, A. A. (1990). If this be research. ... American Psychologist, 44, 670-671.
    Luborsky, L., Singer, B. & Luborsky, L. (1975). Comparative studies of psychotherapy: Is it true that "everyone has won and all must have prizes"? Archives of General Psychiatry, 32, 995-1008.
    Mahoney, M. J. (1978). Experimental methods and outcome evaluation. Journal of Consulting and Clinical Psychology, 46, 660-672.
    Mitroff, I. A. & Featheringham, T. R. (1974). On systemic problem solving and the error of the third kind. Behavioral Science, 19, 383-393.
    Pickering, T. G. (1983). Treatment of mild hypertension and the reduction of cardiovascular mortality: The "of or by" dilemma. Journal of the American Medical Association, 249, 399-400.
    Platt, J. R. (1964). Strong inference. Science, 146, 347-353.
    Popper, K. R. (1959). The logic of scientific discovery. (New York: Basic Books)
    Rush, A. J., Beck, A. T., Kovacs, M. & Hollon, S. (1977). Comparative efficacy of cognitive therapy and pharmacotherapy in the treatment of depressed patients. Cognitive Therapy Research, 1, 17-37.
    Schaffer, N. D. (1982). Multidimensional measures of therapist behavior as predictors of outcome. Psychological Bulletin, 92, 670-681.
    Scott, A. G. & Sechrest, L. (1989). Strength of theory and theory of strength. Evaluation and Program Planning, 12, 329-336.
    Scott, A. G. & Sechrest, L. (in press). Theory driven approach to costbenefit analysis: Implications of program theory.(In H. Chen & P. Rossi (Eds.), Policy studies organization. Westport, CT: Greenwood Press.)
    Sechrest, L. & Redner, R. (1979). Strength and integrity of treatments in evaluation studies.(In How well does it work? Review of criminal justice evaluation, 1978: 2. Review of evaluation results, corrections (p. 19—62). Washington, DC: National Criminal Justice Reference Service.)
    Sechrest, L. & West, S. G. (1983). Measuring the intervention in rehabilitation experiments. International Annals of Criminology, 21, (1) 11-19.
    Sechrest, L. & Yeaton, W. H. (1981a). Empirical bases for estimating effect size.(In R. F. Boruch, P. M. Wortman, D. S. Cordray, & Associates (Eds.), Reanalyzing program valuations: Policies and practices for secondary analysis of social and educational programs (pp. 212—224). San Francisco: Jossey-Bass.)
    Sechrest, L. & Yeaton, W. H. (1981b). Estimating magnitudes of experimental effects. (Journal Supplements Abstract Service: Catalog of Selected Documents in Psychology, 11, , ms. no. 2355, 39 pp.)
    Sechrest, L. & Yeaton, W. H. (1981c). Meaningful measures of effect. Journal of Consulting and Clinical Psychology, 49, 766-767.
    Shoham-Salomon, V. & Hannah, M. T. (1991). Client—treatment interactions in the study of differential change processes. Journal of Consulting and Clinical Psychology, 59, 217-225.
    Smith, M. L., Glass, G. V. & Miller, T. I. (1980). The benefits of psychotherapy. (Baltimore: Johns Hopkins University Press)
    Strupp, H. H. (1989). Psychotherapy: Can the practitioner learn from the researcher? American Psychologist, 44, 717-724.
    Strupp, H. H. & Hadley, S. W. (1979). Specific vs. nonspecific factors in psychotherapy. Archives of General Psychiatry, 36, 1125-1136.
    Talley, P. F., Strupp, H. H. & Morey, L. C. (1990). Matchmaking in psychotherapy: Patient—therapist dimensions and their impact on outcome. Journal of Consulting and Clinical Psychology, 58, 182-188.
    Thompson, L. W., Gallagher, D. & Breckenridge, J. S. (1987). Comparative effectiveness of psychotherapies for depressed elders. Journal of Consulting and Clinical Psychology, 55, 385-390.
    Wahlsten, D. (1990). Insensitivity of the analysis of variance to heredity-environment interaction.
    1

    This hypothetical extrapolation is used here to make a point; we do not encourage such extrapolations from real data. We do, however, encourage studying broader ranges of variables.



    We wish to thank Carol Sigelman and two anonymous reviewers for their helpful comments.
    Correspondence may be addressed to Lee Sechrest, Department of Psychology, University of Arizona, Tucson, Arizona, 85721.
    Received: June 15,1990
    Revised: October 15, 1990
    Accepted: October 15, 1990

    Figure 1. Illustrations of disordinal, ordinal, mixed, and nonlinear Aptitude × Treatment interactions.




    Figure 2. Illustration of the relative predictive value of main effects versus ordinal interactions.