According to this criterion, to judge the MTMM-matrix the number of significant correlations on the validity diagonals are counted. These validity coefficients are the correlations between similar traits measured by different methods. When our four methods are compared with each other, each method has 18 validity coefficients in common with the three other methods. The significant coefficients are counted for each method (see Table 3).
Fifty three percent of the validity coefficients are statistically significant. Moreover, 3 or 4 correlations of the possible maximum of 72 may be significant by chance (p.05; two tailed test).
A closer view of these moderate findings reveals support for the MAAS-MI MH by the Global Expert Ratings, except for the Exploration of the Reasons for Encounter. These findings have already been reported in the previous chapter in which we also concluded that experts do not support the theoretical content of the scales Exploration of the Reason for Encounter and Presenting Solutions because of differences in theoretical orientation in the former and because of lack of reliability on the item level in the letter.
Convergent Validity of MAAS-MI in Mental Health is supported by other measures of interviewing skills
Furthermore, we note a lack of support from the MAAS-MI Global Self-Ratings. This fact is probably due to the deficient reliability and validity of global-rating scales which has been universally noted in the measurement of other domains of medical competence (a.o. Streiner, 1985). The lack of validity is hardly due to the self-evaluation aspect in this method because the MAAS-MI MH SELF supports the validity of the MAAS-MI MH at least in the scales Exploration of the Reasons for Encounter, History-taking and Presenting Solutions. It seems evident that interviewers are better able to evaluate their own technical, problem-solving aspects in the three phases of initial interviews than they are the process aspects such as interpersonal and communicative skills.
This criterion requires the entries on the validity diagonal to be higher than the heterotrait-heteromethed values in the column and row in which a certain validity coefficient is located.
The entries on the validity diagonal mainly consist of covariance of the same trait measured by two different methods. In the heterotrait-heteromethod triangles, the correlations are mainly built up from covariance arising from pairs of different traits measured by these two different methods. It is thus implied by this second criterion that covariance of the same trait measured by different methods is compared with the covariance of one of these traits with a different trait. This comparison is a test of whether each trait can be discerned from another, the measurement methods being kept in control.
To investigate this criterion, the number of times an entry on the validity diagonal is higher than the entries in the related row and column in the two adjoining heterotrait-heteromethod triangle is counted. Each entry on the validity diagonal is therefore compared with two other values as is shown in Table 4.
These marginals of rows and columns are expressed as ratios between obtained and maximum possible scores (60) providing a measure of discriminating capability. In general, it can be stated that the marginals of the columns are a “measure of quality” of a certain combination of methods, whereas the marginals of the rows are a measure of the divergent validity of a certain trait.
In Table 4, the sum scores of the rows reveal that the History-taking and Presenting Solutions traits show strong evidence of divergent validity, whereas the traits pertaining to Structuring the interview and to Interpersonal Skills only have a moderate divergent validity. Their validity coefficients contrast substantially with the comparable correlations in the heterotrait-heteromethod triangles, irrespective of the methods used. The traits of Exploration of the Reasons for Encounter and of Communicative Skills again show insufficient evidence of divergent validity.
On looking at the marginals of the column of Table 4, it turns out that combinations of the MAAS-MI MH, MAAS-MI SELF and the MAAS-MI Global Expert-Rating exhibit the best ability to discriminate the six traits. The relatively great ability of the MAAS-MI SELF and the low ability of the MAAS-MI Global Self-Ratings as methods to discriminate the traits are noteworthy.
In this respect, the MAAS-MI MH Global Self-Ratings suffer, apart from error variance, from two other sources of unreliability due to the characteristics of global-rating scales (see MAAS Medical Interview Construction): high method variance and poor operationalization of underlying theoretical concepts (content validity). Under the heading of criterion 3 and 4, we discuss this subject in greater depth.
Global ratings of interviewing skills in Mental Health suffer from poor operationalization of concepts
According to this criterion, the validity coefficients must be higher than the off-diagonal correlations in their monamethod triangle. Measurement of similar traits obtained by different methods should intercorrelate higher than measures of different traits obtained by the same method. When the validity coefficients are lower than the off-diagonal correlations in their mcnamethod triangle, there is evidence that the traits are highly intercorrelated or/and there is a high (confounding) method variance (Campbell and Fiske, 1959; Schmitt and Stults, 1986).
To study this criterion, we counted the number of times that the three validity coefficients for each trait are higher than the mean of the heterotrait monomethod triangles (see Table 5).
For each trait, three validity coefficients arise when the measurement of this trait by one method is correlated with the measurement of this trait obtained by the three other methods.
Looking at the column marginals, which can be considered as a measurement of the divergent validity of a certain method, we notice that the MAAS-MI MH meets this criterion 22.2% of the time, the MAAS-MI SELF about 16.7% of the time, the Global Expert-Rating Scale 11% of the times and the Global Self Rating-Scale, never. The row marginals can be taken as measurement of the divergent validity of each trait. These row marginals reveal that History-taking skills meets this criterion 50% of the time, whereas the other trans vary from zero to twice meeting this criterion.
These results, which seem rather modest for the MAAS-MH, have to be judged in the light of some critical methodological remarks as to this third criterion. In the application of this criterion, Campbell and Fiske have aimed to compare method variance with trait variance, considering the validity coefficient as common trait variance and the values in the heterotrait-monomethod triangle as results of method variance. This statement does not entirely hold true in our situation.
First, in the discussion of the second criterion, we noticed the confounding of the validity coefficients with shared variance between the two methods compared. This method co-variance is highly probable in the combination of MAAS-MI MH SELF and MAAS-MI MH Global Self-Ratings because of the comparability of both self-evaluation methods.
Second, the off-diagonal correlations in the monomethod triangles, which might be indicative for the method variance, may be confounded and inflated by existing trait intercorrelations. It is highly improbable that the traits are independent of each other because of their fitting in one unidimensional Rasch scale (see Scalable, Reliable, Generalizable).
To summarize: this criterion is difficult to meet because the method variance, which is already fairly high, has, in addition, been inflated with both afore-mentioned influences.
As stated in Convergent & Divergent Validity — The Matrix Unravelled, Fiske (1971) approached this criterion with greater subtlety, accepting method influences as an inseparable aspect of measurement. In our opinion, measurements are not invalidated when the correlations in the heterotrait-monomethod triangles exceed the corresponding validity diagonals. Researchers, however, must be aware that a substantial degree of variance in their measurements is attributable to the measurement method. In such situations, additional studies, such as the generalizability analyses, are needed to determine the magnitude of method variance components.
Following the recommendations made by certain authors (Schmitt e.a., 1986) advocating the use of factor analysis in the study of the MTMM- matrix, we studied this criterion in a two-fold factorial design.
First Hypothesis
First, the scores on the 6 traits are factor analyzed for each of the four methods. Second, the six hetero-method blocks of the MTMM-matrix are factor analyzed. In these blocks, all combinations of the two sets of 6 traits, each measured with two methods, are factor analyzed.
We now restate this fourth criterion into two hypotheses, which should be confirmed after inspection of both sets of factor structures:
- The factor structure of the 6 traits found within each of the four measurement methods should be similar when we measure the same interviews each of these methods.
- The factor structure of the traits in the six hetero-method blocks of the MTMM-matrix should show such a picture that similar traits, measured by a different method, load on the same factor.
It goes without saying that we measure the same interviews with each method. An illustration makes this clearer: take, for instance, the block where the six traits are measured with the MAAS-MI MH Global Expert-Rating and with the MAAS-MI MH Global Self-Rating. When an important factor is found with high loadings of interpersonal and communicative skills measured by the expert ratings, the factor loadings of both traits measured by the self-rating should also be considerable on this same factor. The MTMM-matrix should be similar when we measure the same interviews with each of our four methods.
The first hypothesis has been tested by factor analyzing the trait scores of the MAAS-MI-MH Self (N=80), MAAS-MI MH Global Self-Ratings (N=80) and of the MAAS-MI MH Global Expert-Ratings (N=160). In the standard program of SPSS (Nie et al., 1975), principal component analysis and then Varimax rotation of the factors with an eigenvalue exceeding 1 was carried out.
In Table 6, we present the factor structures obtained by measuring 6 traits with each of the four methods. Factors with an eigenvalue of more than 1 and factor loadings higher than 0.6 are indicated. The factor loadings are indicated in order of decreasing magnitude.
In the factor structure of the MAAS-MI MH, the traits Interpersonal and Communicative Skills are most prominent in the main factor, whereas the traits Exploration of the Reason for Encounter, History-taking and Presenting Solutions load on the second factor. This means that in the MAAS-MH, measurement of the process aspects of interviewing predominate the measurement of the content of the interview.
The MAAS-MI MH SELF shows a strong method variance, accounted for by the single factor found in principal component analysis (Saai et al., 1980; Dielman et al., 1980). This factor shows a pattern different from the factor structure of the MAAS-MH. Measurement of patient- and physician-centered information collection is the most striking measurement property as witnessed by high factor loadings on the traits of Exploration of the Reasons for Encounter and History-taking.
The factor structure of the MAAS-MI MH Global Self-Ratings takes a middle of-the-road position between the patterns of the MAAS-MH and the MAAS-SELF. Principal component analysis also yields one factor (method variance) that shows a mixed pattern of loadings on the traits Exploration of the Reason for Encounter and History-taking, and, to a lesser extent, of Communicative Skills. This pattern seems to represent a measurement property of this method, stressing the effective collection of patient- and physician-centered data pertaining to the presented mental health problem.
The factor structure of the MAAS-MI MH Global Expert-Ratings reveals a completely different picture. The single factor found evidences a considerable method variance when we find a pattern of factor loadings different from the three other methods: the traits Communicative Skills, Exploration for the Reason for Encounter, Presenting Solutions and Structuring the interview load on this factor.
The interpretation of this factor structure may indicate the effective collection of patient-centered data by the physician. This interpretation is comparable with that of the factor structure of the MAAS-MH.
These findings permit the following conclusions to be drawn:
According to the factor structures and patterns of factor loadings, the trait inter-relationships found in the four measurement methods show patterns that are different from each other. However, the MAAS-MI MH SELF and Global Self-Rating on the one hand, and MAAS-MI MH and Global Expert-Rating on the other hand, have some similarities in their pictures.
This finding implies that our first hypothesis, relating to similarities in the interrelationships of traits in the heterotrait- heteromethod triangle, is not confirmed.
Second Hypothesis
The second hypothesis is studied by factor analyzing the six heteromethod blocks of the MTMM-matrix (see Table 2). The standard program of SPSS as used in testing the previous hypothesis has again been used.
When we examine the resulting factor structures with their major factor loadings, then it is evident that our second hypothesis is not confirmed. In all six heteromethod blocks, the same patterns arise: the traits loading on one factor are clusters of different traits measured by one method instead of pairs of the same traits measured by two different methods. We take again our previously mentioned example of the heterorethod block where the six traits are measured with the MAAS-MI MH Global Expert- and with the Global Self-Rating Scales (see Table 7).
We would expect factors with substantial loadings consisting of paired traits. For example, in factor I: EE, ET, CS, EE, ET, CS. We see, however, that EE, ET, CS (measured by the Global Expert-Rating Scale) and EE, ET, CS (measured by the Global Self-Rating Scale) load on two different factors instead of one. In all six heteromethod blocks, a similar pattern is notable.
This finding is due to the high method variance in our methods, resulting in the clusters of different traits measured by the same method in the factors we obtained. The next section is devoted to this important issue of method variance.
Nevertheless, we have to draw the conclusion that we cannot meet this fourth criterion for convergent and divergent validity.
Method variance is not of mere theoretical importance, but can be attributed to effects that are very common in measurement practice: leniency and halo-effects, the latter being the most significant. Halo-effects are consistently conceptualized as an observer’s failure to discriminate among conceptually distinct aspects of a subject’s behavior. Halo-effects are larger when variables have a moral connotation (such as our items pertaining to “interpersonal and communicative skills”) or when single variables are not easily observed and/or are ill-defined (Streiner, 1985, citing Alport, 1937). Observers seem to rate an overall impression concerning the subjects under research conditions that are susceptible to halo-effects. As a result, observers are barely able to assess more than one or two dimensions accurately and all items are consequently associated with each other (Thorndike,1920; Gailford, 1954; Saal et al., 1980). Therefore, global rating scales are particularly susceptible to halo-effects (Streiner, 1985).
Global rating scales rate an overall impression and are particularly susceptible to halo-effects
The problem is to assess the magnitude of method variance in our four instruments. Our design does not allow an exact assessment, but we study in greater depth the manifestations of method variance which we have already encountered in this chapter.
First, in the previous section we have cited Saal et al. (1980) and Dielman et al. (1980) who state that the findings of one single factor in principal component factor analysis suggest a high method variance. Principal components analysis of the trait scores of MAAS-MI MH SELF, Global Self-Ratings and Global Expert-Ratings all yield one factor, with an eigenvalue higher than one (cf Table 6). By contrast in the MAAS-MI MH, two factors arise after principal component analysis of the traits.
Second, the height of the averaged correlations between the traits in the monomethod triangles is an indication of the amount of method variance. This evidence is even strong when it is assumed that the intercorrelation of theoretically different traits is zero to low. In the MTMM-matrix, the averaged correlations between the traits in the monomethod triangles is considerable in the MAAS-MI MH Self, Global Self- Ratings and Global Expert-Ratings (resp. 0.38, 0.32, 0.39), whereas the averaged intercorrelations between the traits in the corresponding triangle of the MAAS-MH are lower (0.25).
Third, we notice in the factors of the heteromethod blocks a clustering of different traits measured with the same method, instead of clustering of similar traits measured with two different methods. This indicates that the clustering of different traits in one factor is due to the method variance, which these trait measures share.
These findings suggest considerable method variance mainly caused by halo-effects in the methods but least present in the MAAS-MI MH.
MAAS-MI in Mental Health, where items are defined and well-operationalized, is best suited to measure interviewing skills in Mental Health
This characteristic of the MAAS-MI MH is the consequence of the constructors’ efforts to define and to operationalize the interviewing skills behaviorally by expressing criteria for scoring in single or multiple behavioral acts. Nevertheless, this conclusion is rather contradicted by the considerable method variance in the MAAS-MI MH SELF, especially when we consider that the MAAS-MH and the MAAS-SELF have aImost similar items. We explain this higher method variance of the MAAS-MI MH SELF by the time lag of about 10-20 minutes between the interview and the rating of the checklist, causing a deficient recall of the performed interviewing behavior and leading to the induction of halo-effects. This deficient recall may cause an impaired discrimination between the questions asked by the physician-self and the topics spontaneously raised by the patient.
MAAs-MI MH Global Self- and Global Expert-Ratings suffer from method-variance because global rating scales are vulnerable to halo-effects (see MAAS Medical Interview Construction). It is striking, however, that the MAAS-MI MH Global Expert-Ratings shows the highest halo-effect (witnessed by the correlations in the corresponding monomethod triangles). These findings are rather disappointing as we expected the experts to be very familiar with the concepts of interviewing in primary mental health care because of their own teaching and health-care experience in this domain.
An important consequence of high method variances may be their confounding effects in validity research. When methods are compared that share a high method variance of the same type, their intercorrelations may be too high, spuriously boosting convergent validity and attenuating divergent validity. In the case of high method variance which is not shared in the compared measurement methods, a reversed picture of inflated divergent validity and repressed convergent validity may ensue.
We have to accept that in our studies, the validity of our MAAS-MI Mental Health measure of interviewing skills can only be assured as far as the measurement characteristics of second-best instruments allow
Turning to our four methods, it is plausible on theoretical grounds that some methods will share method variance. MAAS-MI MH and MAAS-MI MH SELF show a similarity in item number and format, whereas MAAS-MI MH SELF and Global Self-Ratings have the self-evaluation aspect in common. MAAS-MI MH Global Expert-Ratings and Global Self-Ratings have a comparative item format and are both rating scales. This suggests the probability of method co-variance artificially heightening the validity coefficients in the above-mentioned combinations of methods. The amount of method covariance is difficult to assess exactly, but the averaged correlations in the common heterotrait-heteramethod may serve as an indication. These are low for the combinations MAAS-MI MH/MAAS-MI MH SELF (0.04) and MAAS-MI MH Global Expert-Rating/Global Self-Rating (0.08), but considerable for MAAS-SELF/Global Self-Rating (0.30).
On combining these three figures, we may conclude that the self-evaluation dimension which the MAAS-SELF and the Global Self-Rating have in common explains the high method covariance. This conclusion makes an inflation of the convergent validity between both methods probable.
It is surprising that method covariance between the MAAS-MI MH Global Expert- Ratings and Global Self-Ratings seems to be low, notwithstanding their high method variance. Both method variances apparently are of a different nature. The consequence may be an inflated divergent and a deflated convergent validity between both methods. The findings in Table 4 and 5 support this hypothesis.