4.3Convergent & Divergent Validity MAAS-Mental Health Confirmed

Theoretical concepts of medical interviewing in primary health care underlie MAAS Medical Interview Mental Health. But are we really measuring these when we apply the MAAS method to interviewing skills?

We set out to find out by using a multi-trait, multi-method matrix. In this matrix, six traits underlying the interviewing skills needed in mental health care were measured by four methods.

The results indicate that the MAAS Medical Interview Mental Health is ideally suited to measuring interviewing skills in mental health, as evidenced by reasonable convergent and divergent validity and by a relatively small method variance.

Kraan, H. F., & Crijnen, A. A. M. (1987). Convergent and divergent validity of the MAAS-Mental health. In H. F. Kraan & A. A. M. Crijnen (Eds.), The Maastricht History-taking and Advice Checklist – studies on instrumental utility (pp. 305–329). Lundbeck, Amsterdam.

Why a Multi-Trait Multi-Method Matrix?

The study of the convergent and divergent validity is governed by the question: Are the theoretical concepts of medical interviewing in primary health care, that underlie the MAAS-I MH, really measured when we apply this method to interviewing skills?

For instance: are we really measuring the physician’s ability to present solutions when we apply the scale of the same name? We assume that with this scale we are measuring the physician’s ability to propose a treatment plan to the patient, to make the patient responsible for his choice and to negotiate about it etc. We could have, however, measured, for instance, the quality of the treatment plan itself instead of its manner of presentation by the physician to the patient. 

The ideal way to investigate this problem is to compare the measurement of this scale with a method which is known to measure the concept of Presenting Solutions. Vice versa, a method not intended to measure Presenting Solutions should, of course, not measure this concept. These types of validity questions are known as convergent and divergent validity issues. For more detailed information, the reader is referred to Instrumental Utility

In this chapter, we study the convergent and divergent validity of the MAAS-MI MH, comparing this method with 3 different methods: two self-rating methods and one method of expert-rating. 

  • We first briefly describe the methodology used to study the convergent and divergent validity: the multitrait-multimethod matrix (Campbell and Fiske, 1959).
  • Then, the multitrait-multimethod-matrix is studied by means of 4 criteria, stated by Campbell and Fiske, by which we judge the convergent and divergent validity of the MAAS-MH in comparison with the 3 other methods.
  • In the next paragraphs, the results are discussed and explained in terms of the measurement properties of the MAAS-MH and the three other methods.
  • Finally, we end this chapter with conclusions.

Convergent & Divergent Validity of the MAAS-MI Mental Health – A Multitrait-Multimethod Matrix

Earlier, the MTMM-matrix was defined as consisting of the correlations between multiple traits and multiple methods when each of the traits is measured with each of the methods. An instructive example of the most simple version of a MTMM-matrix is presented in Convergent & Divergent Validity — The Matrix Unravelled. This MTMM-matrix is used in the study of validity in order to distinguish the method from the trait variance in the scores. This distinction is a prerequisite for the interpretation of the correlation coefficients in the MTMM-matrix because these correlations always consist of a component of trait covariance and a component of method covariance. The magnitude of this latter component should be estimated in order to evaluate these correlations for their property of validity coefficients. 

In the following sections, the methods and the traits which have been used to fill the MTMM-matrix, are described. An overview of methods and traits is given in Table 1. How the MTMM-matrix is analyzed is discussed in subsequent paragraphs. 

More

Methods To Measure Interviewing Skills 

Four methods have been used: 

  • MAAS-MI Mental Health
  • MAAS-MI Mental Health Self
  • MAAS-MI Mental Health Global-Self-Rating Scale (GSRS)
  • MAAS-MI Mental Health Global-Expert-Rating Scale (GERS)

These methods, abundantly elaborated in Medical Interview & Related Skills and Tools and TOOLS Cont’d, are rehearsed only briefly. 

MAAS-MI Mental Health

In the MAAS-MH, trained observers indicate which of 104 clearly-defined and discernable units of interviewing behavior occur in the course of a medical consultation. The 104 items of the MAAS-MH have been grouped according to the 8 theoretical dimensions (“traits”) of medical interviewing skins. These 8 groups of items are the scales of the MAAS-MH. The scores of the items within these scales have been combined to form construct indices on these theoretical dimensions. 

MAAS-MI Mental Health Self

The MAAS-SELF Mental Health was filled in by the interviewing physicians themselves at the end of the interview. On this checklist, they were asked which of the 92 items of interviewing behavior they performed in the preceding interview. 

MAAS-MI Global Mental Health Self-Rating ScaleThe MAAS-MH and the MAAS-SELF have a similar item content and format, except for the scale “psychiatric examination” where the items in the main classes of symptoms have been reformulated on a more abstract level. This change caused a reduction in the number of items. Nevertheless, the same 8 theoretical dimensions of medical interviewing are measured. The indices on these dimensions are constituted by combining the item scores within these 8 scales as in the MAAS-MH. 

MAAS-MI Mental Health Global-Self-Rating Scale

The Global Self-Rating Scale was rated by the interviewing physicians themselves after completion of the interview. They evaIuated the quality of their interview on six theoretical dimensions and gave one global overall evaluation of the whole interview. Only broad definitions of these dimensions have been provided in the items. The items are to be rated on a 5-point Likert scale. 

MAAS-MI Mental Health Global-Expert-Rating Scale

In the Global Expert-Rating Scale, a multidiscipline panel of experienced care-takers in mental health care rated the quality of the interviewing skills of 40 residents in general practice on six theoretical dimensions (the same as in the previous method). In addition, they were expected to give one global overall evaluation of the whole interview. 

Campbell and Fiske (1959) demanded that each of the methods used in the convergent and divergent validation procedure should be independent in order to minimize the influence of shared method variance. The MAAS-MI MH and the MAAS-MI Global Expert-Rating Scale might share method variance, both being observation instruments. In addition, the Global Self-Rating Scale and the MAAS-Self have the aspect of self-evaluation in common. In addition, the MAAS-MH and MAAS-SELF have a similar format of ‘behaviorally’ described items and the same indices formation (by summation of the item scores within scales). Finally, the Global Expert-Rating Scale and the Global Self-Rating Scale also have a negative characteristic in common: only global definitions of items and no sharp criteria for storing have been given. 

Moreover, we have not included existing methods in our study because they differ in objectives of measurement. Too many differences in underlying traits (see next paragraph) would hamper the comparison of the methods.

Traits in Medical Interviewing in Mental Health

Traits is not used in the sense of a psychological trait. It signals here the underlying theoretical dimension of a group of interviewing skills. These traits we extensively described in Medical Interview & Related Skills  and MAAS Medical Interview Construction. We summarize the definitions and indicate how these traits are measured.

The Exploration of Reasons for Encounter (EE) measures the physician’s ability to clarify the patient’s complaint, to explore the motives in the pre-patient phase leading to the visit to the physician and, finally, to gain insight into the way the patient expects his needs to be met. It is measured with MAAS-MH and MAAS-SELF scales of the same name and in both Global Rating Scales with the item of the same name. 

History-taking (HT) enables the physician to generate explanatory hypotheses about the complaint/problem and to test these hypotheses. Furthermore, we may generate hypotheses about interventions to alter the patient’s condition. In the MAAS-MI MH and MAAS-MI SELF, this trait is measured by combining the scales History-taking, Psychiatric Examination and Socio-emotional Exploration. In the Global Expert-Rating Scale and the Global Self-Rating Scale, the history-taking ability is measured by two items: one pertaining to the data that contribute to explanatory hypotheses and one pertaining to data necessary to generate hypotheses for treatment interventions. The scores on both items are taken as the trait measures in both methods. 

During Presenting Solutions (PS), physicians convey information about causes and prognosis of the problem, negotiate with the patient about problem definition and possible solutions and provide concrete information about possible treatment interventions. In the MAAS-MI MH and MAAS-MI SELF, they are measured by the scales of the same name and in the Global Rating Scale, by the items of the same name. 

By skills to Structure the medical interview (STR) is understood the ability to open and to terminate the interview and to pass from one phase to another in a way that is perspicuous to the patient. 

Interpersonal Skills (IPS) aim to establish an optimal rapport with the patient. 

Communicative Skills (CS) should serve to promote an effective exchange of information between patient and physician. 

The last three traits are measured in the same manner as the Presenting Solutions trait. 

However, Campbell and Fiske (1959) demand that these traits ought to be independent and therefore have near zero to low correlations. In our situation, these ideal circumstances are not met. Different traits sometimes have process aspects in common such as questioning, conveying information etc. Sometimes there is also an overlap in the content aspects: for instance, between questioning about causal conditions of the mental health problems in the scales Exploration of the Reasons for Encounter, History-taking or Socio-emotional Exploration. On theoretical grounds, some correlative relationship between the traits may thus be expected.

Constructing and analyzing the multitrait multimethod matrix. 

In principle, the MTMM-matrix presented in Table 2 was constructed as a completely crossed intercorrelation of the scores of the 6 traits of interviewing skills, each trait measured by the 4 methods. The scale scores were taken from the performance of 40 residents in general practice interviewing the two simulated mental patients. 

However, in order to improve the reliability of these scale scores, the following arrangements have been made:

  • The inter-rater reliability has been improved by adding live MAAS- scale scores obtained during the simulated consultation hour and MAAS-scale scores by observers rating the videotaped interviews several months later. As a result, we have at our disposal a summated set of MAAS-scores from two independent, well-trained observers of each case. We used the Rasch homogeneous scales of the MAAS-MH in these validity studies, also taking advantage of their slightly better reliability figures. A similar summation to improve reliability was carried out with the Global Expert-Rating for which there are also two sets of scores (each by an independent observer) per case available.
  • The inter-case reliability has been enhanced by partializing the case-influence from the correlation matrix. Partialization has been carried out by including “case influence” in the original correlation matrix as a dummy variable, taking the value 1 for case 1 (depression) and value 2 for case 2 (anxiety). The correlations with this dummy variable are subtracted from the original matrix. The resulting first order correlations are used in the MTMM-matrix (Guilford et al., 1982, 6th ed.). After this partialization, it is permitted to add the scores of both cases. 

Before calculating the matrix, the missing data (less than 0,5% on the item level) in the MAAS-MI SELF and the MAAS-MI Global Self-Rating Scale have been estimated by regressing items with missing values (as dependent variables) on the items (as independent variables) with which they were most highly correlated (BMDP-program: PAM-single). In the scores of the Globe’ Expert-Rating Scale, no missing data have been detected. 

The multitrait-multimethod matrix has been built up by intercorrelations of the trait scores obtained by each of these four methods. The resulting matrix counts 24 (6 [traits] x 4 [methods]) intercorrelated variables, yielding 288 (24×24/2) correlations coefficients. 

To provide a better overview of the matrix, the means of the (validity) diagonals and of the triangles have been written down on the right-hand side of the grand diagonal, the axis of symmetry. These means have been calculated after transformation of the correlations into Fisher Z-scores. 

On studying the matrix, correlations of .19 or higher are taken as significant (p<.05; two tailed test; N=80). This means that approximately 14 of the 288 correlations are significant by chance. 

This study of a MTMM-matrix proceeds along the lines of the four criteria proposed by Campbell and Fiske (Campbell and Fiske 1959; Schmitt and Stults, 1986) which have already been extensively described in chapter 8. We repeat them in the following section. 

If the investigated method (MAAS-MH) meets these four criteria, then a perfect convergent and divergent validity of this method has been attained.

Results

This section describes the study of the MTMM-matrix by means of the 4 criteria of Campbell and Fiske. Results are presented in Tables 3 – 5. In addition, a further part of this section is devoted to the phenomenon of method variance, a source of systematic variance in measures of interviewing skills whatever method is used.

The Matrix Discussed

Criterion 1: Convergent Validity Regarding Traits  

According to this criterion, to judge the MTMM-matrix the number of significant correlations on the validity diagonals are counted. These validity coefficients are the correlations between similar traits measured by different methods. When our four methods are compared with each other, each method has 18 validity coefficients in common with the three other methods. The significant coefficients are counted for each method (see Table 3). 

Fifty three percent of the validity coefficients are statistically significant. Moreover, 3 or 4 correlations of the possible maximum of 72 may be significant by chance (p.05; two tailed test). 

A closer view of these moderate findings reveals support for the MAAS-MI MH by the Global Expert Ratings, except for the Exploration of the Reasons for Encounter.  These findings have already been reported in the previous chapter in which we also concluded that experts do not support the theoretical content of the scales Exploration of the Reason for Encounter and Presenting Solutions because of differences in theoretical orientation in the former and because of lack of reliability on the item level in the letter.

Convergent Validity of MAAS-MI in Mental Health is supported by other measures of interviewing skills 

Furthermore, we note a lack of support from the MAAS-MI Global Self-Ratings. This fact is probably due to the deficient reliability and validity of global-rating scales which has been universally noted in the measurement of other domains of medical competence (a.o. Streiner, 1985). The lack of validity is hardly due to the self-evaluation aspect in this method because the MAAS-MI MH SELF supports the validity of the MAAS-MI MH at least in the scales Exploration of the Reasons for Encounter, History-taking and Presenting Solutions. It seems evident that interviewers are better able to evaluate their own technical, problem-solving aspects in the three phases of initial interviews than they are the process aspects such as interpersonal and communicative skills.

Criterion 2: Divergent Validity Regarding Traits

This criterion requires the entries on the validity diagonal to be higher than the heterotrait-heteromethed values in the column and row in which a certain validity coefficient is located. 

The entries on the validity diagonal mainly consist of covariance of the same trait measured by two different methods. In the heterotrait-heteromethod triangles, the correlations are mainly built up from covariance arising from pairs of different traits measured by these two different methods. It is thus implied by this second criterion that covariance of the same trait measured by different methods is compared with the covariance of one of these traits with a different trait. This comparison is a test of whether each trait can be discerned from another, the measurement methods being kept in control. 

To investigate this criterion, the number of times an entry on the validity diagonal is higher than the entries in the related row and column in the two adjoining heterotrait-heteromethod triangle is counted. Each entry on the validity diagonal is therefore compared with two other values as is shown in Table 4

These marginals of rows and columns are expressed as ratios between obtained and maximum possible scores (60) providing a measure of discriminating capability. In general, it can be stated that the marginals of the columns are a “measure of quality” of a certain combination of methods, whereas the marginals of the rows are a measure of the divergent validity of a certain trait. 

In Table 4, the sum scores of the rows reveal that the History-taking and Presenting Solutions traits show strong evidence of divergent validity, whereas the traits pertaining to Structuring the interview and to Interpersonal Skills only have a moderate divergent validity. Their validity coefficients contrast substantially with the comparable correlations in the heterotrait-heteromethod triangles, irrespective of the methods used. The traits of Exploration of the Reasons for Encounter and of Communicative Skills again show insufficient evidence of divergent validity. 

On looking at the marginals of the column of Table 4, it turns out that combinations of the MAAS-MI MH, MAAS-MI SELF and the MAAS-MI Global Expert-Rating exhibit the best ability to discriminate the six traits. The relatively great ability of the MAAS-MI SELF and the low ability of the MAAS-MI Global Self-Ratings as methods to discriminate the traits are noteworthy. 

In this respect, the MAAS-MI MH Global Self-Ratings suffer, apart from error variance, from two other sources of unreliability due to the characteristics of global-rating scales (see MAAS Medical Interview Construction): high method variance and poor operationalization of underlying theoretical concepts (content validity). Under the heading of criterion 3 and 4, we discuss this subject in greater depth.

Global ratings of interviewing skills in Mental Health suffer from poor operationalization of concepts 

Criterion 3: Divergent Validity Regarding Methods 

According to this criterion, the validity coefficients must be higher than the off-diagonal correlations in their monamethod triangle. Measurement of similar traits obtained by different methods should intercorrelate higher than measures of different traits obtained by the same method. When the validity coefficients are lower than the off-diagonal correlations in their mcnamethod triangle, there is evidence that the traits are highly intercorrelated or/and there is a high (confounding) method variance (Campbell and Fiske, 1959; Schmitt and Stults, 1986). 

To study this criterion, we counted the number of times that the three validity coefficients for each trait are higher than the mean of the heterotrait monomethod triangles (see Table 5). 

For each trait, three validity coefficients arise when the measurement of this trait by one method is correlated with the measurement of this trait obtained by the three other methods. 

Looking at the column marginals, which can be considered as a measurement of the divergent validity of a certain method, we notice that the MAAS-MI MH meets this criterion 22.2% of the time, the MAAS-MI SELF about 16.7% of the time, the Global Expert-Rating Scale 11% of the times and the Global Self Rating-Scale, never. The row marginals can be taken as measurement of the divergent validity of each trait. These row marginals reveal that History-taking skills meets this criterion 50% of the time, whereas the other trans vary from zero to twice meeting this criterion. 

These results, which seem rather modest for the MAAS-MH, have to be judged in the light of some critical methodological remarks as to this third criterion. In the application of this criterion, Campbell and Fiske have aimed to compare method variance with trait variance, considering the validity coefficient as common trait variance and the values in the heterotrait-monomethod triangle as results of method variance. This statement does not entirely hold true in our situation. 

First, in the discussion of the second criterion, we noticed the confounding of the validity coefficients with shared variance between the two methods compared. This method co-variance is highly probable in the combination of MAAS-MI MH SELF and MAAS-MI MH Global Self-Ratings because of the comparability of both self-evaluation methods. 

Second, the off-diagonal correlations in the monomethod triangles, which might be indicative for the method variance, may be confounded and inflated by existing trait intercorrelations. It is highly improbable that the traits are independent of each other because of their fitting in one unidimensional Rasch scale (see Scalable, Reliable, Generalizable). 

To summarize: this criterion is difficult to meet because the method variance, which is already fairly high, has, in addition, been inflated with both afore-mentioned influences. 

As stated in Convergent & Divergent Validity — The Matrix Unravelled, Fiske (1971) approached this criterion with greater subtlety, accepting method influences as an inseparable aspect of measurement. In our opinion, measurements are not invalidated when the correlations in the heterotrait-monomethod triangles exceed the corresponding validity diagonals. Researchers, however, must be aware that a substantial degree of variance in their measurements is attributable to the measurement method. In such situations, additional studies, such as the generalizability analyses, are needed to determine the magnitude of method variance components.

Criterion 4: Divergent Validity in Traits & Methods 

Following the recommendations made by certain authors (Schmitt e.a., 1986) advocating the use of factor analysis in the study of the MTMM- matrix, we studied this criterion in a two-fold factorial design.

First Hypothesis

First, the scores on the 6 traits are factor analyzed for each of the four methods. Second, the six hetero-method blocks of the MTMM-matrix are factor analyzed. In these blocks, all combinations of the two sets of 6 traits, each measured with two methods, are factor analyzed. 

We now restate this fourth criterion into two hypotheses, which should be confirmed after inspection of both sets of factor structures:

  1. The factor structure of the 6 traits found within each of the four measurement methods should be similar when we measure the same interviews each of these methods.
  2. The factor structure of the traits in the six hetero-method blocks of the MTMM-matrix should show such a picture that similar traits, measured by a different method, load on the same factor.

It goes without saying that we measure the same interviews with each method. An illustration makes this clearer: take, for instance, the block where the six traits are measured with the MAAS-MI MH Global Expert-Rating and with the MAAS-MI MH Global Self-Rating. When an important factor is found with high loadings of interpersonal and communicative skills measured by the expert ratings, the factor loadings of both traits measured by the self-rating should also be considerable on this same factor. The MTMM-matrix should be similar when we measure the same interviews with each of our four methods. 

The first hypothesis has been tested by factor analyzing the trait scores of the MAAS-MI-MH Self (N=80), MAAS-MI MH Global Self-Ratings (N=80) and of the MAAS-MI MH Global Expert-Ratings (N=160). In the standard program of SPSS (Nie et al., 1975), principal component analysis and then Varimax rotation of the factors with an eigenvalue exceeding 1 was carried out. 

In Table 6, we present the factor structures obtained by measuring 6 traits with each of the four methods. Factors with an eigenvalue of more than 1 and factor loadings higher than 0.6 are indicated. The factor loadings are indicated in order of decreasing magnitude. 

In the factor structure of the MAAS-MI MH, the traits Interpersonal and Communicative Skills are most prominent in the main factor, whereas the traits Exploration of the Reason for Encounter, History-taking and Presenting Solutions load on the second factor. This means that in the MAAS-MH, measurement of the process aspects of interviewing predominate the measurement of the content of the interview. 

The MAAS-MI MH SELF shows a strong method variance, accounted for by the single factor found in principal component analysis (Saai et al., 1980; Dielman et al., 1980). This factor shows a pattern different from the factor structure of the MAAS-MH. Measurement of patient- and physician-centered information collection is the most striking measurement property as witnessed by high factor loadings on the traits of Exploration of the Reasons for Encounter and History-taking

The factor structure of the MAAS-MI MH Global Self-Ratings takes a middle of-the-road position between the patterns of the MAAS-MH and the MAAS-SELF. Principal component analysis also yields one factor (method variance) that shows a mixed pattern of loadings on the traits Exploration of the Reason for Encounter and History-taking, and, to a lesser extent, of Communicative Skills. This pattern seems to represent a measurement property of this method, stressing the effective collection of patient- and physician-centered data pertaining to the presented mental health problem. 

The factor structure of the MAAS-MI MH Global Expert-Ratings reveals a completely different picture. The single factor found evidences a considerable method variance when we find a pattern of factor loadings different from the three other methods: the traits Communicative Skills, Exploration for the Reason for Encounter, Presenting Solutions and Structuring the interview load on this factor. 

The interpretation of this factor structure may indicate the effective collection of patient-centered data by the physician. This interpretation is comparable with that of the factor structure of the MAAS-MH.

These findings permit the following conclusions to be drawn:

According to the factor structures and patterns of factor loadings, the trait inter-relationships found in the four measurement methods show patterns that are different from each other. However, the MAAS-MI MH SELF and Global Self-Rating on the one hand, and MAAS-MI MH and Global Expert-Rating on the other hand, have some similarities in their pictures. 

This finding implies that our first hypothesis, relating to similarities in the interrelationships of traits in the heterotrait- heteromethod triangle, is not confirmed.

Second Hypothesis

The second hypothesis is studied by factor analyzing the six heteromethod blocks of the MTMM-matrix (see Table 2). The standard program of SPSS as used in testing the previous hypothesis has again been used. 

When we examine the resulting factor structures with their major factor loadings, then it is evident that our second hypothesis is not confirmed. In all six heteromethod blocks, the same patterns arise: the traits loading on one factor are clusters of different traits measured by one method instead of pairs of the same traits measured by two different methods. We take again our previously mentioned example of the heterorethod block where the six traits are measured with the MAAS-MI MH Global Expert- and with the Global Self-Rating Scales (see Table 7). 

We would expect factors with substantial loadings consisting of paired traits. For example, in factor I: EE, ET, CS, EE, ET, CS. We see, however, that EE, ET, CS (measured by the Global Expert-Rating Scale) and EE, ET, CS (measured by the Global Self-Rating Scale) load on two different factors instead of one. In all six heteromethod blocks, a similar pattern is notable. 

This finding is due to the high method variance in our methods, resulting in the clusters of different traits measured by the same method in the factors we obtained. The next section is devoted to this important issue of method variance. 

Nevertheless, we have to draw the conclusion that we cannot meet this fourth criterion for convergent and divergent validity. 

Method variance on Convergent & Divergent Validty in Mental Health

Method variance is not of mere theoretical importance, but can be attributed to effects that are very common in measurement practice: leniency and halo-effects, the latter being the most significant. Halo-effects are consistently conceptualized as an observer’s failure to discriminate among conceptually distinct aspects of a subject’s behavior. Halo-effects are larger when variables have a moral connotation (such as our items pertaining to “interpersonal and communicative skills”) or when single variables are not easily observed and/or are ill-defined (Streiner, 1985, citing Alport, 1937). Observers seem to rate an overall impression concerning the subjects under research conditions that are susceptible to halo-effects. As a result, observers are barely able to assess more than one or two dimensions accurately and all items are consequently associated with each other (Thorndike,1920; Gailford, 1954; Saal et al., 1980). Therefore, global rating scales are particularly susceptible to halo-effects (Streiner, 1985). 

Global rating scales rate an overall impression and are particularly susceptible to halo-effects

The problem is to assess the magnitude of method variance in our four instruments. Our design does not allow an exact assessment, but we study in greater depth the manifestations of method variance which we have already encountered in this chapter.

First, in the previous section we have cited Saal et al. (1980) and Dielman et al. (1980) who state that the findings of one single factor in principal component factor analysis suggest a high method variance. Principal components analysis of the trait scores of MAAS-MI MH SELF, Global Self-Ratings and Global Expert-Ratings all yield one factor, with an eigenvalue higher than one (cf Table 6). By contrast in the MAAS-MI MH, two factors arise after principal component analysis of the traits. 

Second, the height of the averaged correlations between the traits in the monomethod triangles is an indication of the amount of method variance. This evidence is even strong when it is assumed that the intercorrelation of theoretically different traits is zero to low. In the MTMM-matrix, the averaged correlations between the traits in the monomethod triangles is considerable in the MAAS-MI MH Self, Global Self- Ratings and Global Expert-Ratings (resp. 0.38, 0.32, 0.39), whereas the averaged intercorrelations between the traits in the corresponding triangle of the MAAS-MH are lower (0.25). 

Third, we notice in the factors of the heteromethod blocks a clustering of different traits measured with the same method, instead of clustering of similar traits measured with two different methods. This indicates that the clustering of different traits in one factor is due to the method variance, which these trait measures share. 

These findings suggest considerable method variance mainly caused by halo-effects in the methods but least present in the MAAS-MI MH.

MAAS-MI in Mental Health, where items are defined and well-operationalized, is best suited to measure interviewing skills in Mental Health

This characteristic of the MAAS-MI MH is the consequence of the constructors’ efforts to define and to operationalize the interviewing skills behaviorally by expressing criteria for scoring in single or multiple behavioral acts. Nevertheless, this conclusion is rather contradicted by the considerable method variance in the MAAS-MI MH SELF, especially when we consider that the MAAS-MH and the MAAS-SELF have aImost similar items. We explain this higher method variance of the MAAS-MI MH SELF by the time lag of about 10-20 minutes between the interview and the rating of the checklist, causing a deficient recall of the performed interviewing behavior and leading to the induction of halo-effects. This deficient recall may cause an impaired discrimination between the questions asked by the physician-self and the topics spontaneously raised by the patient. 

MAAs-MI MH Global Self- and Global Expert-Ratings suffer from method-variance because global rating scales are vulnerable to halo-effects (see MAAS Medical Interview Construction). It is striking, however, that the MAAS-MI MH Global Expert-Ratings shows the highest halo-effect (witnessed by the correlations in the corresponding monomethod triangles). These findings are rather disappointing as we expected the experts to be very familiar with the concepts of interviewing in primary mental health care because of their own teaching and health-care experience in this domain. 

An important consequence of high method variances may be their confounding effects in validity research. When methods are compared that share a high method variance of the same type, their intercorrelations may be too high, spuriously boosting convergent validity and attenuating divergent validity. In the case of high method variance which is not shared in the compared measurement methods, a reversed picture of inflated divergent validity and repressed convergent validity may ensue.

We have to accept that in our studies, the validity of our MAAS-MI Mental Health measure of interviewing skills can only be assured as far as the measurement characteristics of second-best instruments allow

Turning to our four methods, it is plausible on theoretical grounds that some methods will share method variance. MAAS-MI MH and MAAS-MI MH SELF show a similarity in item number and format, whereas MAAS-MI MH SELF and Global Self-Ratings have the self-evaluation aspect in common. MAAS-MI MH Global Expert-Ratings and Global Self-Ratings have a comparative item format and are both rating scales. This suggests the probability of method co-variance artificially heightening the validity coefficients in the above-mentioned combinations of methods. The amount of method covariance is difficult to assess exactly, but the averaged correlations in the common heterotrait-heteramethod may serve as an indication. These are low for the combinations MAAS-MI MH/MAAS-MI MH SELF (0.04) and MAAS-MI MH Global Expert-Rating/Global Self-Rating (0.08), but considerable for MAAS-SELF/Global Self-Rating (0.30). 

On combining these three figures, we may conclude that the self-evaluation dimension which the MAAS-SELF and the Global Self-Rating have in common explains the high method covariance. This conclusion makes an inflation of the convergent validity between both methods probable. 

It is surprising that method covariance between the MAAS-MI MH Global Expert- Ratings and Global Self-Ratings seems to be low, notwithstanding their high method variance. Both method variances apparently are of a different nature. The consequence may be an inflated divergent and a deflated convergent validity between both methods. The findings in Table 4 and 5 support this hypothesis.

Concluding Remarks

The convergent and divergent validity of the MAAS-MI Mental Health has been studied by means of the multitrait-multimethod matrix. In this matrix, six traits underlying the interviewing skills necessary in mental health care have been measured simultaneously by four methods. 

The six traits measured by scales of the MAAS-MH are:

  • Exploration of the Reasons for Encounter
  • History-taking
    • including Psychiatric Examination
    • Socio-emotional Exploration
  • Presenting Solutions 
  • Structuring the interview
  • Interpersonal Skills
  • Communicative Skills 

The four methods involved are:

  • MAAS-MI MH, a method to observe 104 single and complex interviewing skills
  • MAAS-MI MH SELF, a self-rating method for 92 single and complex interviewing skills
  • MAAS-MI MH Global Self-Rating Scale, Likert-type, self-rating scales for medical interviewing
  • MAAS-MI MH Global Expert-Rating Scale, Likert-type evaluative scales for 6 medical interviewing to be ratel by experts. 

The items in these methods are all operationalizations of the 6 previously-mentioned traits.

The scores used in this study were obtained from an experiment in which 40 residents in general practice each interviewed two simulated patients, one with major depression and one with panic disorder. In order to remove “case influence” from correlations of the matrix, this effect has been partialized out by means of a dummy variable representing “case influence”. The resulting first order correlations have been taken as MTMM-matrix for study. 

To study the convergent and discriminant validity, the four criteria as developed by Campbell and Fiske (1959), have been applied to the matrix.

Convergent Validity: Traits

Convergent validity of the MAAS-MI MH has been reasonably supported by the MAAS-MI MH SELF and the Global Expert-Rating Scale. All six traits of the MAAS-MI MH have been supported by the Global Expert-Ratings except for the Exploration of the Reasons for Encounter and Presenting Solutions. Surprisingly, these traits, in combination with History-taking, are well corroborated by the MAAS-MI MH SELF.

This self-evaluation variant of the MAAS shows a better convergent validity regarding the History-taking (problem-solving skills) than regarding the Interpersonal and Communicative Skills. The MAAS-MI MH Global Self-Rating Scale does not contribute to convergent validity because of the insufficient operationalization of theoretical concepts in items leaving too much room for substantial halo-effects.

Divergent Validity: Trait

Divergent validity regarding the traits (the second criterion of Campbell and Fiske) is reasonably met: History-taking, Presenting Solutions and, to a lesser extent, Structuring the interview and Interpersonal Skills, are distinguishable traits in all the methods except for the MAAS-MI MH Global Self-Rating Scale. The validity here of the trait Exploration of the Reasons for Encounter is also questionable. From chapter 11, we already know that this phenomenon is due to confusion about the underlying theoretical concepts of this scale. It is striking that the MAAS-MI MH SELF has a slightly better ability to distinguish this trait than the MAAS-MI MH and the Global Expert-Ratings. The Global Self-Rating Scale performs poorly in discriminating the traits because of its insufficient measurement properties. 

Divergent Validity: Methods

Divergent validity in terms of separating trait from method variance (the third criterion), is best with the MAAS-MH. Method variance is discriminated from trait variance in the traits underlying “history- taking” and, to a lesser extent, both still surprisingly high, in the “communicative skills”. Nevertheless, the “performance” of all foor methods on this criterion in an absolute sense is moderate (MAAS-MH) to poor. 

Convergent & Divergent Validity: Patterns in Trait Interrelationships

The fourth criterion of divergent and convergent validity claims a similarity in the patterns of trait inter-relationships in all heterotrait triangles. This criterion has not been met because of two reasons. First, factor analysis reveals that the traits of interviewing skills measured by the four different methods each shows a different pattern of intercorrelation. Secondly, the high method variance, mainly due to halo-effects, has a disturbing influence. This method variance is the lowest in the MAAS-MH because of its behaviorally defined items. 

MAAS-MI Mental Health best measure of interviewing skills in mental health

The MAAS-MH turns out to be comparatively the best measure of physicians’ interviewing skills in Mental Health Care, evidenced by reasonable convergent and divergent validity and by a relatively small method variance. 

MAAS-MI Mental Health turns out to be the best measure of interviewing skills in Mental Health

The factor analyses of the scores on the interviews with the four methods also give more insight into their measurement properties. In the MAAS-MI MH, a factor pattern arises which indicates that its measurement properties stress, in particular, the Interpersonal and Communicative Skills and, to a lesser extent, the content aspects of the three phases of the initial interview: Exploration of the Reason for Encounter, History-taking and Presenting Solutions. This pattern suggests that it assesses good medical interviewing as a balanced combination of process and content aspects. 

In both self-evaluation methods, the MAAS-MI MH SELF and the Global Self-Ratings, the factor patterns resemble each other. They reveal an accent on the measurement of the accurate collection of patient- and physician-centered data. 

The factor pattern of the MAAS-MI MH Global Expert-Ratings presents a quite different picture, stressing the Communicative Skills in gathering patient-centered information and in Presenting Solutions. It is clear that each method stresses its own priorities in measurement.

Table 1 -- Methods and traits in the measurement of physicians’ interviewing skills in MAAS-MI Mental Health used in MTMM-matrix
Schermafbeelding 2021-04-06 om 15.00.20
Table 2 -- Multitrait-multimethod matrix, consisting of the crossed (partial) correlations of 6 traits of interviewing skills in mental health, each measured by 4 methods
Schermafbeelding 2021-04-06 om 16.51.35
Table 3 -- Number of significant correlations (p<.05, 2-tailed test) on the validity diagonal between each method and the three other methods (maximum =18)
Schermafbeelding 2021-04-06 om 14.56.47
Table 4 -- Number of times where the entries on the validity diagonal are higher than the heterotrait-heteromethod-values in corresponding column and row (maximum per cell = 10); maximum per row or column = 60)
Schermafbeelding 2021-04-06 om 15.17.00
Table 5 -- Number of times that the three validity values of each trait are higher than the mean of a heterotrait-monomethod triangle (maximum per cell = 3; maximum per row = 12; maximum per column = 18)
Schermafbeelding 2021-04-06 om 15.27.11
Table 6 -- Overview of the factor structures obtained by measuring the 6 traits underlying interviewing with four methods of measurement (eigenvalue factors >.1; factor-loadings >.6)
Schermafbeelding 2021-04-06 om 15.31.27
Table 7 -- Factor structures with their major (>.6) loadings on six interviewing traits obtained by factor-analyzing the heteromethod block of the Global Self-Rating and the Global Expert-Rating scales of the MAAS-MI MH MTMM-matrix
Schermafbeelding 2021-04-06 om 15.43.22

References 

Alport GW. Personality: a psychological interpretation. Holt Co., New York, 1937. 

Campbell DT, Fiske DW. Convergent and discriminant validation by the multitrait-multimethod matrix. Psychol. Bull., 1959; 56: 81-105. 

Dielman TE, Hall A, Davis WK. Psychometric properties of clinical performance ratings. Evaluation and the Health Professions, 1980; 3: 103-117. 

Fiske D. Measuring the concepts of personality. Chicago, 1971. 

Guilford JP. Psychometric methods. McGraw-Hill, New York, 1954 (2nd ed.) GUilford JP, Fruchter B. Fundamental statistics psychology and education. McGraw-Hill, Auckland, 1982 (6th ed.). 

Nie NH, Hadlai HUll C, Jenkins JG, Steinbrenner K, Bent DH. Statistical package for the social sciences (SPSS). McGraw-Hill, New York, 1975 (2nd ed.). 

Saai FE, Downey FG, Lahey NA. Rating the ratings: assessing the psychometric quality of rating data. Psychol. Bull., 1980; 88: 413-428. 

Schmitt N, Stults DM. Methodology review: analysis of multitrait-multimethod matrices. Applied Psychological Measurement, 1986; 10: 1- 22. Streiner DL. Global rating scales. In: Néufeld VR, Norman GR (Eds.). Assessing clinical competence. Springer Publ., New York, 1985. 

Thorndike EL. A constant error in psychological ratings. J. Appl. Psychol., 1920; 4: 25-29.