3.2Convergent & Divergent Validity – The Matrix Unravelled

Little attention has been paid to testing the validity of methods for measuring medical-interview performance.

We therefore decided that it was important to test both the convergent and divergent validity of the MAAS Medical Interview. To do this, we used the multi-trait, multi-method matrix. This allowed us to measure several dimensions of medical interviewing skills using several methods.

The results indicated that MAAS is the best measure of medical interviewing skills (compared with three other tools) because it shows both convergent and divergent validity. It is also minimally influenced by the measurement method.

 

Crijnen, A. A. M., & Kraan, H. F. (1987). Convergent and divergent validity of four methods of measurement of medical interviewing skills: a multitrait-multimethod approach. In H. F. Kraan & A. A. M. Crijnen (Eds.), The Maastricht History-taking and Advice Checklist – studies of instrumental utility (pp. 203–231). Lundbeck, Amsterdam.

The Matrix Constructed & Unravelled

Establishing convergent and divergent validity is one of the procedures suggested by Cronbach and Meehl (1955) and further elaborated by Campbell and Fiske (1959) for validating the meaning of measurements. It presumes that when two tests measure the same construct, a substantial correlation between the tests will emanate.

Validity presumes that when tests measure the same construct, a substantial correlation will emanate

Campbell and Fiske argue that demonstration of convergent and divergent validity requires that several measurements of the same construct confirm the meaning of a construct whereas, at the same time, measurements of other constructs are expected to support their distinct character. 

  • Convergent validity is then indicated by substantial correlations between independent measurements of similar constructs;
  • Divergent validity is indicated by low correlations between measurements of unrelated constructs.

Essential Notion

The essential notion in Campbell and Fiske’s approach is that each measurement forms a combination of a trait and a method: measurements combine a particular content with a measurement procedure which is not specific to that content. They therefore suggest the determination of the relative contributions of trait and method variance to measurements by applying more than one trait as well as more than one method in the validation process.

Each measure forms a combination of a trait and a method

Support for convergent and divergent validity was obtained by framing a multitrait-multimethod matrix (MTMM-matrix), consisting of the correlations between a number of traits each measured by several methods. This matrix had to be examined subsequently according to four criteria proposed by Campbell and Fiske.

Campbell and Fiske’s criteria for convergent and divergent validity

  1. The convergence of the traits
  2. The divergence of the traits
  3. The divergence of the methods
  4. A pattern of relations between the traits

Several authors recommend the examination of the multi-trait, multi-method matrix as an ideal validation procedure in test development (Kerlinger, 1981; Thorndike, 1982). Although recommended, the MTMM-validation procedure has not often been applied to psychological research because of its demanding criteria. 

Applied to MAAS Medical Interview

In the present study, we investigate the convergent and divergent validity of the MAAS-MI in General Medicine:

  • In addition to the MAAS-MI General, measuring six dimensions in physician’s interview behaviour, three distinct measurements of medical interviewing skills are applied in order to assess the quality of physicians’ medical interviewing skills.
  • The multi-trait, multi-method matrix, consisting of the correlations between the six traits and four methods of measurement, is constructed and subsequently examined according to Campbell and Fiske’s criteria.

Campbell & Fiske’s Criteria

In this section, we elaborate the construction of the multitrait-multimethod matrix. Moreover, Campbell and Fiske’s criteria and their relation to the matrix are presented. A MTMM-matrix consists of the correlations resulting when each of several traits is measured with each of several methods.

Table 1 displays an example of a MTMM-matrix for two traits (1, 2) and two methods (A, B).

Table 1 -- Theoretical Multitrait Multimethod Matrix for Two Traits (1, 2) and Two Methods (A, B)
Schermafbeelding 2021-03-13 om 13.21.26

Each trait is measured by each method and, subsequently, correlated.

  • The values on the diagonal (rA1A1, rA2A2, rB1B1, rB2B2) are reliability indices, mostly coefficients of internal consistency.
  • The diagonal in the Method A/Method B-block, called validity diagonal, contains the correlations between one trait measured by different methods (rA1B1, rA2B2).
  • The value(s) in the Method Al/Method A2-triangle (rA1A2) or Method B1/Method B2 (rB1B2)-triangle are called heterotrait-monomethod-triangles.
  • The value(s) in the Method A1/Method B2-triangle (rA1B2) or Method A2/Method B1-triangle (rA2B1) are called heterotrait-heteromethod-triangles

Convergent and divergent validity are assessed by inspecting the MTMM-matrix according to the four Campbell and Fiske criteria (Campbell et al, 1959; Schmitt et al, 1986). 

First Criterion: Convergent Validity

The first criterion refers to the convergence of independent methods with regard to the measurement of a similar trait. It states that values on the validity diagonal should be large enough to warrant further examination of validity. As a minimal requirement, correlations on the validity diagonals (rA1B1 and rA2B2) ought to be statistically significant. 

Second Criterion: Divergent Validity

The second criterion pertains to the verification of distinctions between traits. This criterion states that values on the validity diagonal should be higher than the heterotrait-heteromethod correlations of the column and row in which the individual validity value is located. This criterion can be studied by examining whether a correlation on the validity diagonal (rA1B1) exceeds the strength of the correlations on the adjoining column and row (rA1B2 and rA2B1). 

Third Criterion: Trait Versus Method Variance

The third criterion determines the extent to which method variance contributes to the scores. It states that values on the validity diagonal must be higher than the off-diagonal values in its monomethod triangle. Variables should correlate higher with measurements of the same trait obtained by different methods of measurement than with other traits measured with the same method. This criterion can be studied by examining whether correlations on the validity diagonals (rA1B1 or rA2B2) exceed the (median of the) correlation(s) of the monomethod-heterotrait block (rA1A2 or rB1B2). 

Fourth Criterion: Divergent Validity in Trait and Methods

The fourth criterion states that the patterns of trait inter-relationship should be the same in all heterotrait triangles in both monomethod and heteromethod blocks. Campbell and Fiske have not developed procedures to establish this criterion unequivocally. 

In summary, convergent and divergent validity are ascertained when measurements behave according to the four Campbell and Fiske criteria

Over the course of time, several more advanced statistical procedures for analyzing MTMM-matrices have been developed (Marsh et al, 1983; Schmitt et al, 1986).

  • Confirmatory factor analysis (LISREL- Joreskog, 1974) is usually seen as the most appropriate model for evaluating convergent and divergent validity of MTMM-matrices. We have tried to apply LISREL-analyses to our MTMM-matrix, but because of technical problems, we have been unable to analyse our data thoroughly.
  • Moreover, analyses of variance are not recommended for the study of MTMM-matrices because, for among other reasons, there is no clear equivalence between the ANOVA-effects and the Campbell and Fiske criteria (Marsh et al, 1983).

We therefore approach the analyses of our MTMM-matrix of measurements of medical interviewing skills by means of the original Campbell and Fiske criteria.

Constructing a MTMM-matrix for the Medical Interview

In the following section, we will describe:

  • The four instruments which measure medical interviewing skills;
  • The six traits that are discerned;
  • The subjects and experimental setting;
  • The analyses. 

Four instruments measuring medical interviewing skills

The methods of measurement employed in the MTMM validation procedure are expected to meet several requirements:

  • Firstly, the methods are required to be completely independent of each other.
  • Secondly, the instruments are required to measure the traits as conceptualized appropriately.

Fiske (1971) elaborated the first requirement by stating that the procedures to obtain scores consist of a chain of events between the original behavior of the subject and the assignment of an index.

He distinguished three features in this process:

  • The modes for measuring personality;
  • The method of data-recording;
  • The process of indices production. 

Independent Methods of Measurement

We finally achieved the construction of four independent methods of measurement of medical interviewing skills by varying the modes of measurement and the procedures of indices construction:

  • Two methods of measurement, namely, the MAAS-MI General and the MAAS-MI Global Expert-Rating Scale, utilize the observation of behavior mode, whereas two other measures, namely, the MAAS-MI Self and the MAAS-MI Global-Self Rating Scale, make use of the self-description mode.
  • In the MAAS-MI General and MAAS-MI Self, researchers constructed the indices, whereas in the MAAS-MI Global Expert-Rating Scale and the MAAS-MI Global Self-Rating Scale, indices were produced by, respectively, experienced general practitioners and the interviewing physicians.

Each measure therefore differed as much as possible from the others. In Table 2, the modes, the indices producers and the resulting measurement instruments are depicted. 

In the following paragraphs, each of the four methods is briefly characterized.

Table 2 -- Modes, Indices Producer and Resulting Methods of Four Methods of Measurement of Medical Interviewing Skills
Schermafbeelding 2021-03-13 om 14.19.55

1. MAAS-MI General: Observation of Distinct Interview Behaviors by Experts

The instrument is classified as an observation of behavior mode with indices production by the researchers. 

  • In the MAAS-MI, trained observers indicated whether each of 68 discernable units of interview behavior occurred in the course of a medical consultation (see also MAAS-MI General.)
  • Items are organized around six scales, pertaining to six theoretical dimensions in medical interviewing skills (see Medical Interview & Related Skills and MAAS Construction).
  • Researchers combined item responses to these scales to construct indices of the physician’s competency on the six dimensions.
  • Scalability was secured by using items which appeared to fit to the Rasch-model, whereas reliability was enhanced by adding a second observer to the process of measurement.
  • Summed scores of pairs of observers for each scale were used to constitute the MTMM-matrix. Issues of scalability and reliability are elaborated in Scalability & Reliability

2. MAAS-MI Self: Distinct Interview Behavior by Self-evaluation

This method is classified as a self-description mode with indices production by the researchers. 

  • The MAAS-MI Self is a self-assessment instrument; see also MAAS-MI Self.
  • The MAAS-self was constructed by a slight transformation of the original MAAS-items to clear self-descriptions of behavior, such as I provided information about the cause of the presented problem. This transformation was possible because of the high face-validity of the MAAS-MI G which was constructed to form a feedback tool in medical education (see also MAAS Construction).
  • The MAAS-MI Self has to be filled in by the interviewing physicians at the end of a medical consultation. Physicians are requested to indicate whether they have performed each of 68 units of interview behavior in the course of the preceding interview. The units of interview behavior are present or absent.
  • Responses on MAAS-MI Self items identical to the items which formed the Rasch homogeneous scales in the MAAS-MIG were combined by researchers to constitute the indices of six dimensions in medical interviewing skills.

3. MAAS-MI Global: Global Assessment by Experts

This method utilizes the observation of behavior mode whereas indices are produced by general practitioners. 

  • In the MAAS-MI Global (see also MAAS-MI Global), experienced general practitioners rate the quality of six dimensions of a physician’s medical interviewing skills on a global evaluative rating scale after observing a medical interview. To make their evaluative ratings, experts dispose of only rather implicit criteria which are based on the face validity of the items. Since no further definitions or criteria for scoring are given, experts are considered to be the indices producers.
  • They have to respond with degrees of agreement or disagreement (5-point Likert-scale) to a set of eight items each pertaining to one of the dimensions discerned in medical interviewing skills. 

4. MAAS-MI Global Self-evaluation

This method is classified as a self-description mode whereas the indices are produced by the physicians. 

  • The MAAS-MI Global-Self Rating Scale (see also MAAS-MI Global-Self) has to be completed by the interviewing physicians themselves after the interview with a (simulated) patient.
  • Physicians are asked to report their subjective impressions of the quality of the interview on eight dimensions. As in the previous method, only global definitions of the dimensions are given, based on the face validity of the items.
  • The interviewing physicians have to evaluate the quality of their own interview behavior on rather implicit criteria and are therefore required to produce the indices themselves. Items are rated on 5-point Likert-type scales. 

Minimize Method Co-variance

Campbell and Fiske require the methods used in the MTMM- validation procedure to be independent in order to minimize the influence of method covariance.

  • Of the methods presented here, we can assume that MAAS-MI G and MAAS-MI Global share method-variance because both are observation instruments, whereas MAAS-MI Self and MAAS-MI Global-Self RS are likely to share method-variance because both are self-description instruments.
  • Furthermore, MAAS-MI G and MAAS-MI Self have in common a similar set of behaviorally described items and a similar procedure of indices formation. MAAS-MI Global and MAAS-MI Global-Self partially converge in that both are global evaluative rating scales without a priori criteria or definitions. We therefore have to acknowledge that even within discernible methods of measurement, convergent and divergent elements can be recognized simultaneously. 

Six dimensions in medical interviewing skills

In addition to the independency of methods, Campbell and Fiske require that the traits are also independent. This requirement was posed to achieve near-zero values in the heterotrait-heteromethod and the heterotrait-monomethod triangles, and to maximize differences between the validity coefficients, the trait intercorrelations and the method influences. On reviewing these requirements, Fiske (1971) stated that one can never establish that, in an empirical sense, a trait is uncorrelated with all other traits; this mitigated the requirement for trait independency. It is therefore sufficient that traits are theoretically distinct before they are employed in a MTMM validation procedure. 

The theoretical considerations leading to the classification of medical interviewing skills into six dimensions are elaborated in chapters mentioned above. We confine ourselves here to summarizing the characteristics of the dimensions. 

  • The skill to Explore the Reasons for Encounter refers to a physician’s ability to clarify the patient’s complaint and to explore the motives in the pre-patient phase which led to the patient visiting the physician.
  • History-taking skills enable the physician to generate hypotheses about the nature of the complaint, to test these hypotheses and to describe the patient’s complaint in medical explanatory terms.
  • During the Presenting Solutions, physicians convey information on causes and prognosis of the medical problem; they negotiate with the patient about the medical problem and its solutions and they provide concrete information on the approach in the near future.
  • Some interviewing skills enable the physician to Structure the medical interview.
  • Moreover Interpersonal Skills enable the physician to establish an optimal rapport with the patient, whereas Communicative Skills are apt to promote an effective exchange of information between patient and physician and vice versa.

These six dimensions constitute the elements of an appropriate initial interview in primary health care. A summary of methods and dimensions is shown. 

Methods & Dimensions in Medical Interviewing Skills

  • MAAS-MI
  • MAAS-MI Self
  • MAAS-MI Global
  • MAAS-MI Global Self-rating
  1. Exploring Reasons for Encounter
  2. History-taking
  3. Presenting Solutions
  4. Structuring
  5. Interpersonal Skills
  6. Communicative Skills

Methods, Analyses & Results

More

Subjects

All forty residents of the Department of General Practice who participated in the 1984-1985 residency program took part in this study. These 31 men and 9 women (mean age: 28.6 year) finished medical school at the age of 26.4. Before starting the residency program, they had worked for an average of 10 months in health care. Physicians were not selected on their interviewing skills before they were allowed to participate in the residency program. At the time of the study, 11 residents had almost completed the residency program and the others had just started their training (mean: 5 months in residency program; range: 1-11 months). During the program, the residents received 5-10 hours on average of courses in medical interviewing skills. More than half of these residents had followed the undergraduate curriculum at Maastricht Medical School (23); 11 came from Nijmegen, 1 from Utrecht, 2 from Amsterdam (UvA) and 3 from Groningen. 

Research setting

To secure optimal conditions for measurement, comparability and control, a simulated consultation hour was created in which 40 residents in General Practice interviewed four different simulated patients (Crijnen et al, 1986). Two weeks before the simulated consultation hour took place, residents were informed about the goal and procedures of the study. 

During the simulated consultation hours, residents were asked to behave as if they had taken charge of a colleague’s practice and to perform a complete medical consultation with each simulated patient. Since physical examinations formed no part of the research setting, information about the patient’s physical condition was given to the resident on request. Six rooms with video-equipment were at our disposal in the Skills Laboratory at Maastricht Medical School. 

Following a signal, observers switched on the video-equipment and a simulated patient entered the consultation room. Residents were allowed to speak for a maximum of 15 minutes with the patients.

  • During the consultation, six well-trained observers filled in the MAAS-MI General (instrument 1) and rated the physician’s interview behavior.
  • A second signal, at 15 minutes, indicated that the consultation had to be terminated. The residents then rated their interview on the MAAS-MI Global Self-Rating Scale (instrument 4) and the MAAS-MI Self (instrument 2), and they filled in the scale Medical Problem-solving.
  • In addition, simulated patients filled in the Patient Satisfaction with Communication Checklist. A third signal, at 30 minutes, indicated that simulated patients and observers had to change rooms. This procedure was repeated four times. 

Several months later, eleven general practitioners recruited from the Department of Family Medicine and the Skills Laboratory, were asked to observe the videotaped medical interviews and to rate the quality of the physicians’ medical interviewing skills by means of the MAAS-MI Global (instrument 3). Each interview was observed by two randomly assigned general practitioners. These general practitioners were considered to be experts in general practice because of long experience in general practice and teaching positions in the undergraduate medical curriculum or the residency General Practice Program. Clinical experience was important in order to anchor MAAS-MI General-scores to clinical reality and relevance, whereas educational experience was considered to enhance experts’ understanding of what constitutes a good medical interview. 

All videotaped medical interviews were observed a second time with use of the MAAS-MI G by one of three observers. Summed scores on the scales of live and video observations were added to enhance the reliability of measurement. 

The MTMM validation procedure is based on case presentations of three simulated patients presenting complaints accompanying a myocardial infarction. The myocardial infarction case, borrowed from a real case-history and documented by a general practitioner and psychologist, described a 50 year-old building contractor who was worried about his heart because he had experienced the night before a short attack of intensive chest pain. The patient had smoked for years and had recently gone through a period of severe problems as a result of the economic recession. The simulated patients were recruited from the Skills Laboratory and were instructed by a psychologist and general practitioner to present the case naturally.

Analyses

Before constructing the multitrait-multimethod matrix, data were prepared for computations.

  • Missing data were estimated by regressing variables with missing data on the variable with which they were most highly correlated (BMDP-program PAM-single). In MAAS-MI G and MAAS-MI Global, no data were missing. In MAAS-MI Self and MAAS-MI Global-Self respectively, 0.2% and 0.1% data were missing. For MAAS-self, missing data were estimated and replaced. Since one GSRS was not filled out at all, missing data could not be estimated and this physician’s interview was left out of analyses. Six additional observations on MAAS-G and GERS could not be obtained because these consultations were not videotaped due to technical failure. Since only complete cases were used, data obtained on 33 medical consultations were left for computations.
  • For MAAS-MI G, indices of the six traits were computed by summing responses on items that fitted in the Rasch-model of live and video observations together. For MAAS-MI Self, indices for the six traits were computed by summing responses on items that appeared to fit in the Rasch homogeneous MAAS-MI G-scales. For MAAS-MI G and MAAS-MI Self, the scales Interpersonal Skills and Communication Skills were dichotomized according to predetermined criteria. For MAAS-MI Global, indices for the six traits were obtained by adding ratings on each item of pairs of experts who observed the same interview. For MAAS-MI Global-Self, indices for the six traits were not transformed. 

Firstly, mean scores, standard deviations and ranges were computed for each trait measured by each scale. Results are presented in Table 4

Secondly, the multitrait-multimethod matrix was constructed by computing Pearson product-moment correlations between the six traits measured by each of the four methods. The median for each validity-diagonal, for each heterotrait-heteramethad triangle and for each heterotrait-monomethod triangle were calculated. The value between the two middle values of each validity-diagonal was considered to be the median. Results are displayed in Table 5

First criterion

To establish the first criterion, the number of significant correlations (p<.05) on the validity-diagonals was counted for bath traits and methods. Results are displayed in Table 6

Second criterion

To establish the second criterion, the number of times that correlations on the validity-diagonal exceeded the strength of the correlations in the corresponding column and row of the two adjoining heterotrait-heteramethod triangles was counted. Each value on the validity-diagonal was compared to 10 other correlations. Results are displayed in Table 7

Third criterion

To establish the third criterion, the number of times that the three validity-values for each trait exceeded the median of each heterotrait-monomethod triangle was counted. Results are displayed in Table 8. A second operationalization of the third criterion by comparing the validity-values with each of the correlations in the monomethod triangles separately revealed essentially the same information and is therefore not presented. 

Fourth criterion

To establish the fourth criterion, we considered factor-analyzing parts of our correlation matrix. Due to an insufficient number of participating physicians, factor-analyses would yield unstable results. It was therefore decided not to study the fourth criterion. 

Results

Inspection of Table 4 reveals that, according to MAAS-MI G-scores, most physicians display only a limited number of the interviewing skills that can be displayed during a medical consultation. Furthermore, averaged MAAS-MI G-scores are lower than MAAS-MI Self-scores, and MAAS-MI Global-scores are lower than MAAS-MI Global-Self-scores. MAAS-MI G-scores are under or just above the scale midpoints, whereas scores of MAAS-MI Self, MAAS-MI Global0 and MAAS-MI Global-Self are above the scale midpoints. Standard deviations for MAAS-MI G and MAAS-MI Self and for MAAS-MI Global and MAAS-MI Global-Self are almost identical. The range of scores shows that MAAS-MI G-scales never reach the upper limits of scoring, whereas all other scales do reach the upper limits. 

First Criterion

With regard to the first criterion, for MAAS-MI G and MAAS-MI Self, about 60% of the correlations on the validity diagonal are significant, and for MAAS-MI Global and MAAS-MI Global-Self respectively, 50% and 40% (Table 6). The validity of the dimensions Structuring and Interpersonal Skills is supported almost always; of the dimensions History-taking, Presenting Solutions and Communication Skills, only half of the time; the validity of the dimension Exploring Reasons for Encounter is never supported. 

Taking into account that support by MAAS-MI Global and MAAS-MI Self is most important, we conclude that the validity of the MAAS-G-scales History-taking, Presenting Solutions, Structuring and Interpersonal Skills is confirmed; that the validity of the scale Exploring Reasons for Encounter is discredited and that the validity of the scale Communication Skills is neither supported nor discredited. 

Second Criterion

With regard to the second criterion, inspection of Table 7 depicts that Presenting Solutions and Structuring the interview, and, to a lesser extent, History-taking and Interpersonal Skills can be clearly differentiated from each other. The results support the distinct character of these dimensions. Exploring Reasons for Encounter and Communication Skills are differentiated less clearly from other dimensions. 

The number of times that validity-diagonals between pairs of methods exceed the correlations of the adjoining rows and columns, shown in Table 7, discloses that MAAS-MI G/MAAS-MI Global and MAAS-MI G/MAAS-MI Self distinguish the dimensions in medical interviewing skills adequately. The other combinations appear to differentiate the dimensions less well. MAAS-MI G seems best able to discern different dimensions of medical interviewing skills. MAAS-MI Global and MAAS-MI Self are second and third, whereas MAAS-MI Global-Self is almost unable to distinguish dimensions of interviewing skills. 

Third Criterion

With regard to the third criterion, the median of the monomethod triangles, shown in Table 5, provides information on the considerable impact of the method on the measurement of medical interviewing skills. 

The reported differences are amazingly great and vary from a median of .22 for MAAS-MI G to a median of .63 for MAAS-MI Global. 

The results summarized in Table 8 reveal that MAAS-MI G, followed by MAAS-MI Self, are plagued least by a disturbing influence of the method of measurement. The measurement qualities of both rating scales, MAAS-MI Global and MAAS-MI Global-Self, are considered to be impaired by strong method influences. Moreover, inter-observer reliability for MAAS-MI Global, expressed in Pearson product-moment correlations between pairs of experts, is low to moderate. Correlations between experts on each item vary from .15 to .42. 

Table 4 -- Scale Midpoint, Mean, Standard Deviation and Range For Scales of MAAS-MI General, MAAS-MI Self, MAAS-MI Global and MAAS-MI Global-Self
Schermafbeelding 2021-03-16 om 10.02.15
Table 5 -- Multitrait-multimethod Matrix of the MAAS-MI General and Other Methods of Measuring Medical Interviewing Skills
Schermafbeelding 2021-03-16 om 10.04.27

Below diagonal: Pearson correlation coefficients between 4 methods and 6 traits of medical interviewing skills.

Above diagonal: Median values for validity-diagonals, in brackets and bold for heterotrait-heteromethod triangles, heterotrait-monomethod triangles.

On diagonal and in brackets: Correlation between two experts in MAAS-MI Global Expert Rating Scale.

N=33 physicians, Case: myocardial infarction; r>.29 than p<.05; r>.40 than p<.01.

Table 6 -- Number of Significant Correlations (p<.05) on the Validity-diagonal for Each trait and Method (Maximum is 6 Respectively 18)
Schermafbeelding 2021-03-16 om 10.09.40
Table 7 -- Number of Values on the Validity-diagonal Higher Than Heterotrait-heteromethod Values in Corresponding Column and Row (maximum = 10)
Schermafbeelding 2021-03-16 om 10.25.37
Table 8 -- Number of Times That the Three Validity values of each Trait are Higher Than Median of a Heterotrait-monomethod Triangle (Maximum = 3)
Schermafbeelding 2021-03-16 om 10.28.31

Discussion

The following section pertains to a discussion of the convergent and divergent validity of the MAAS-MI in General Medicine examined according to Campbell and Fiske’s criteria (1955). In addition, the validity of three other methods of measurement of medical interviewing skills are scrutinized. 

Criterion 1 – Convergent Trait Validity

Criterion 1 states that values on the validity-diagonal should be large enough to support convergent validity. 

MAAS-MI General

For MAAS-MI General, the confirmation of convergent validity by MAAS-MI Global is especially encouraging because it indicates that general practitioners with experience in medical education and primary health care agree with the operationalizations by MAAS-MI General-scales of the dimensions History-taking, Presenting Solutions, Structuring and Interpersonal Skills.

Validity of MAAS-MI General measures of interviewing skills is strongly confirmed

Moreover, convergent validity of these scales is underscored by physicians’ recordings of their own interviewing skills in MAAS-MI Self. 

The insufficient evidence of convergent validity for the scales Exploring Reasons for Encounter and Communication Skills is disappointing and needs further elaboration. The lack of validity can be attributed:

  • To either vagueness of the underlying theoretical concept;
  • To inadequate operationalization of the theoretical dimension in the items of the scale;
  • Or to insufficient measurement properties. 

With regard to the first reason, operationalization of the Exploration of Reasons for Encounter, one aspect is considered to be missing. In addition to eliciting information about factors in the pre-patient phase leading to the visit, patients should be asked to formulate their request for help explicitly. Item 6 in the MAAS-MI General pertains clearly to the patient’s request for help, but it is our opinion that more attention should be given to this issue because of its steering influence on content and process of an initial interview.

Eisenthal and Lazare (1976, 1983) found that interview behavior which helped the patient to put his request into words was related to feelings of being helped, of satisfaction and plan wanted. Patients find it difficult to verbalize their request for help whereas, at the same time, they consider this to be very important. A structuring activity by the physician and his collaborative involvement stimulates the patient to formulate their request for help. On the scale Exploring Reasons for Encounter, more items must focus on this issue.

A second reason for insufficient support of convergent validity is found in the characteristics of global rating scales which are considered to impair the quality of measurement. This issue is discussed with the third criterion. 

Communication Skills, on the other hand, are measured unreliably by means of the MAAS-MI General which hinders determination of any form of validity (see MAAS-MI General). 

MAAS-MI Self

For MAAS-MI Self, evidence of convergent validity is available for the scales History-taking and Structuring and, to a lesser extent, for Presenting Solutions and Interpersonal Skills. No evidence of convergent validity is obtained for Exploring Reasons for Encounter and Communication Skills.

MAAS-MI Self is a valid tool for self-evaluation of interviewing skills in medical school and residency training

The same is essentially true for MAAS-MI Self as for MAAS-MI General, but because a classical test-retest design cannot be carried out, unreliability of MAAS-MI Self has to be taken into account as a confounding influence on the validation process. 

MAAS-MI Global

For MAAS-MI Global, the validity for History-taking, Presenting Solutions, Structuring and Interpersonal Skills is confirmed; the validity of Exploring Reasons for Encounter is discredited, and the validity for Communication Skills is neither supported nor discredited.

With regard to the measurement characteristics of global rating scales, it is known that raters are unable to assess more than two dimensions of performance accurately. In medical education, physicians discern most a problem-solving and interpersonal-skill dimension, which largely agrees with the results presented here (DieIman et al, 1980; Streiner, 1985).

  • History-taking, Presenting Solutions and Structuring are considered as reflecting the problem-solving dimension;
  • Whereas Interpersonal Skills reflects the interpersonal dimension;
  • Exploring Reasons for Encounter and Communication Skills are not clearly discerned by general practitioners. 

MAAS-MI Global-Self

For MAAS-MI Global-Self, convergent validity of Interpersonal Skills is unequivocally supported by the validity coefficients. Apparently, a physician’s experience of their interpersonal skills displayed during the interview agrees with the impression of MAAS-MI General-observers and experts. This is of importance, because it confirms the validity of an important but difficult to measure quality of a medical interview.

MAAS-MI Global-Self is well-able to measure interpersonal skills in a medical consultation

Since convergent validity of the other dimensions is only supported by strong correlations with MAAS-MI Self and not by MAAS-MI General or MAAS-MI Global, we conclude that the validity of global self-rating scales of medical interviewing skills has to be questioned with the exception of measures of interpersonal skills . 

In Conclusion

MAAS-MI General, MAAS-MI Self and MAAS-MI Global display evidence of convergent validity for History-taking, Presenting Solutions, Structuring and Interpersonal Skills. Insufficient evidence was obtained to support convergent validity of the Exploration of Reasons for Encounter and Communication Skills.

For MAAS-MI Global-Self, convergent validity is obtained for the measure of Interpersonal Skills, whereas the validity of the other measures is discredited. 

Criterion 2 – Divergent Trait Validity

To support divergent validity with regard to the dimensions of interest, criterion 2 states that values on the validity-diagonal should be higher than the values of the corresponding column and row in the heterotrait-heteromethod triangle. Campbell and Fiske’s goal was to verify a method of measurement’s capability of distinguishing the dimension of interest from several other dimensions. They required the median of each heterotrait-heteromethod triangle to approach zero in order to enhance determination of divergent validity. The median values, shown in Table 5, reveal that none of them approaches zero, which suggests that the methods and/or the dimensions are related. We expected this to occur because we were not able to construct totally independent methods of measurement and because the theoretical dimensions which were discerned in medical interviewing skills will be related to some extent. 

The results, shown in the right-hand column of Table 7, depict that the dimensions Presenting Solutions and Structuring and, to a lesser extent, History-taking and Interpersonal Skills, are clearly differentiated from each other. These results support the distinct character of the dimensions and underscore the theoretical considerations that led to the differentiation of medical interviewing skills into six distinct dimensions. Once again, Exploring Reasons for Encounter and Communication Skills are less well discerned due to low correlations on the validity diagonals. 

The combination of MAAS-MI General/MAAS-MI Global and MAAS-MI General/MAAS-MI Self distinguishes the different types of medical interviewing skills most adequately as is shown in Table 7. The other combinations of methods differentiate the dimensions less well. MAAS-MI General thus appears to be the best able to discern different types of medical interviewing skills.

MAAS-MI General discerns different types of medical interviewing skills best: History-taking, Presenting Solutions, Structuring the interview and Interpersonal Skills are distinct types of medical interviewing skills, Exploration of Reasons for Encounter and Communication Skills are harder to distinguish 

MAAS-MI Global and MAAS-MI Self are second and third, whereas MAAS-MI Global-Self is almost unable to discern dimensions with the exception of Interpersonal skills

In conclusion

Four dimensions of medical interviewing skills that were discerned theoretically and used to construct the MAAS-MI General-scales can be distinguished empirically. History-taking, Presenting Solutions, Structuring the interview and Interpersonal Skills are distinct types of medical interviewing skills.

Difficulties arise in distinguishing the dimensions referring to the Exploration of Reasons for Encounter and Communication Skills

Criterion 3 – Trait versus Method Variance

The third criterion was formulated to secure optimal measurement of the dimensions because every psychological measurement device is characterized by features that are specific to the dimension of interest and other features which are characteristic for the method being employed. Criterion 3 states that values on the validity diagonal must be higher than the off-diagonal values in the monomethod triangle. Since the process of measuring always elicits irrelevant method variance, measurements are considered to be invalidated to the extent that method variance contributes to the scores obtained. 

A look at Table 8 reveals that features of the measurement process impinge strongly upon the scores obtained with MAAS-MI Global and MAAS-MI Global-Self, whereas a smaller influence of the method is observed for MAAS-MI Self and MAAS-MI General. Since the interview behavior on which the data are based was similar for all methods, the differences can be attributed to the methods that were employed. 

The third criterion undeniably discloses the difficulties that arise in psychological measurement.

MAAS-MI General

Of all methods, MAAS-MI General demonstrates the best measurement properties because it evokes a low degree of method variance and shows considerable correlations on the validity-diagonals.

Of all methods, MAAS-MI General demonstrates the best measurement properties

Once again, Exploring Reasons for Encounter and Communication Skills are measured improperly and are therefore primarily responsible for the failure of MAAS-MI General on the third criterion. As we performed a generalizability study, discussed in MAAS-MI General, we know that Exploring Reasons for Encounter in particular is measured fairly reliably with low levels of method variance. Furthermore, one of Campbell and Fiske’s requirements is that each of the methods employed should measure the dimensions as conceptualized appropriately. It is therefore our opinion that the lack of success of the Exploration of Reasons for Encounter on the third criterion can be partly attributed to the failure of the other methods to measure this dimension properly. 

MAAS-MI Self

MAAS-MI Self displays more method-variance when compared to MAAS-MI General, and less when compared to MAAS-MI Global and MAAS-MI Global-Self. A look at Table 4 shows that the mean of each MAAS-MI Self scale is considerably higher than the mean of identical MAAS-MI General scales. This interesting finding demonstrates that interviewing physicians believe that they perform more facets of interview behavior than they actually do. We have often had the experience in examination situations of noting that medical students mixed up information given by the patient with their own questioning behavior: when physicians received information, they often thought they had asked for it. This induces unreliability in MAAS-self measures. In conclusion, we remark that influences of the self-description mode applied in MAAS-MI Self are likely to interfere negatively with the measurement of the dimensions. 

MAAS-MI Global

It is evident that MAAS-MI Global is affected strongly by the method of measurement, which consists of observation of behavior and, subsequently, a rating by experts. Although a considerable influence of the method was expected to occur, we were surprised by the strength of the halo-effect.

MAAS-MI Global is strongly plagued by method variance, especially halo-effects, but also by leniency and central tendency

Halo:

  • Is conceptualized by an observer’s failure to discriminate among conceptually distinct and potentially independent aspects of a subject’s behavior
  • Is operationalized by high intercorrelations between different dimensions (Saai et al, 1980).

We asked general practitioners with experience in both general practice and medical education to participate in this study, especially because they were supposed to be able to distinguish the occurrence and quality of different medical interviewing skills. However, even experts experience difficulties in discerning dimensions in medical interviewing skills when no clearly-worded and well-defined items are available.

Even experts experience difficulties in discerning interviewing skills when no well-defined and clearly-worded items are available

Ratings of distinct types of interviewing skills displayed during a medical consultation are reduced to a judgment about a problem-solving and an interpersonal-skills dimension (Dielman et al, 1980; Streiner, 1985).

A second type of method influence, so-called leniency, a rater’s tendency to assign a higher or lower rating to a subject’s behavior, also appears to occur because all averaged ratings of MAAS-MI Global are above the midpoint of the scales. Most experts use the positive part of the scale continuum.

Restriction of range, the third type of method influence finally, seems to take place because most raters do not use the extreme ends of the scales. The negative side in particular is almost never used. We therefore conclude that strong halo-effects, leniency and central tendency, are all likely to impede the measurement properties of the MAAS-MI Global. 

MAAS-MI Global-Self

The influence of the method, especially of halo-effect, on MAAS-MI Global-Self is considered to come close to MAAS-MI Self, because the median of the monomethod triangle is near the median for MAAS-MI Self. We expected the influence of halo in MAAS-MI Global-Self to approach halo in MAAS-MI Global, because both global rating scales have the feature in common that the behavior of interest is not well defined. It seems that physicians who are in the actual interview situation experience more differentiation than experts. 

Moreover, leniency is suspected of influencing MAAS-MI Global-Self strongly, because averaged ratings on MAAS-MI Global-Self are significantly above the midpoint of the scales, leading to a decrease in the amount of variance. Interviewing physicians rate the quality of their own interview behavior more positively in comparison with observers’ ratings. Restriction of range, finally, seems to take place because categories on the negative side of the scales are almost never used.

In conclusion, we observe that halo, leniency and restriction of range in particular, are inclined to diminish the measurement properties of the MAAS-MI Global-Self. Since MAAS-MI Global-Self utilizes one item which is not well-defined to represent each dimension, this method of measuring medical interviewing skills is considered to be highly unreliable. 

How should the third criterion be regarded?

Campbell and Fiske constructed this criterion in order to determine the major sources of variation in measurements and in order to conclude that enough trait variance was measured to sustain optimal measurement. These precautions were taken to secure the process of measurement. With regard to our study, the third criterion undeniably discloses the difficulties that arise in psychological measurement: halo, leniency and restriction of range appear to occur in varying degrees in our measurements, but MAAS-MI General appears to be least influenced by method-variance. 

Criterion 4 – Divergent Validity in Trait and Method

The fourth criterion states that the patterns of trait inter-relationship should be the same in all heterotrait triangles in both monomethod and heteromethod blocks in order to provide evidence for divergent validity. Satisfaction of this criterion would suggest that the underlying traits are really correlated, whereas failure of this criterion would imply that the observed correlation between traits assessed by a given method is due to a method or halo bias (Marsh et al, 1983). The interpretation of the fourth criterion has posed problems for us and for several other researchers because Campbell and Fiske did not operationalize it. Some authors have merely mentioned the fourth criterion in a publication but have not applied it to their data (Marsh et al, 1983). Other authors have considered this criterion to be too strict and therefore unrealistic (Magnusson, 1966).

In our study it is unrealistic to interpret a correlation matrix consisting of 288 correlations. Each interpretation can be refuted by other correlations which will then suggest a different explanation. Furthermore, (parts of) the correlation matrix cannot be factor analyzed because of the small number of students (see also MAAS-MI Mental Health). We therefore decided not to apply the fourth criterion to our correlation matrix. As this decision was taken, it can be concluded that no clear pattern of interrelations between the dimensions was observed in our data and that method influences are likely to interfere in the strength of the correlations but this had already been revealed during the interpretation of the third criterion.

Concluding Remarks

The convergent and divergent validity of the MAAS-MI General, in addition to the validity of three other methods of measurement of medical interviewing skills, is studied by means of the multi-trait, multi-method matrix. In the multi-trait, multi-method matrix, several dimensions in medical interviewing skills are measured with several methods. The resulting correlation matrix is scrutinized by means of four criteria which were developed by Campbell and Fiske (1959). 

For the MAAS-MI General, the convergent validity of History-taking, Presenting Solutions, Structuring and Interpersonal Skills is clearly warranted by the strength of the correlations, whereas the Exploration of Reasons for Encounter and Communication Skills fail to provide evidence of convergent validity.

Essentially, the same conclusions can be drawn for a self-evaluation variant of the MAAS-MI G and the MAAS-MI Global Expert-Rating Scale, whereas for the MAAS-MI Global Self-Rating Scale, insufficient evidence for convergent validity is obtained with the exception of a measurement of Interpersonal Skills

Divergent validity of dimensions in medical interviewing skills is established for History-taking, Presenting Solutions, Structuring and Interpersonal Skills. Difficulties arise in distinguishing dimensions referring to the Exploring Reasons for Encounter and Communication Skills. Furthermore, MAAS-MI G appears to be the most effective in discerning dimensions, followed by MAAS-MI Global and MAAS-MI Self, whereas MAAS-MI Global-Self is unable to distinguish dimensions. However, Exploring Reasons for Encounter may well be poorly distinguished because of inability of the global measures.

Moreover, MAAS-MI General displays the best measurement properties because it evokes only low degrees of method-variance when compared to other methods. Halo, leniency and restriction of range are inclined to diminish the measurement properties of MAAS-MI Global and MAAS-MI Global-Self, and partly of MAAS-MI Self. 

MAAS-MI General displays the best evidence of convergent and divergent validity

All in all, MAAS-MI General appears to be the best method of measurement of medical interviewing skills because it displays evidence of convergent and divergent validity, and is minimally influenced by the method of measurement.

References

Campbell DT, Fiske DW. Convergent and discriminant validation by the multi-trait multi-method matrix. Psychological Bulletin, 1959; 56: 81- 105. Cronbach LJ, Meehl PE. Construct validity in psychological tests. Psychological Bulletin, 1955; 52: 281-302. 

Crijnen AAM, Thiel J van, Kraan HF. Evaluatie van consultvoering: een spreekuur nagebootst (Evaluation of a medical consultation: simulating consultation hours). Huisarts en Wetenschap; 1986: 29: 316- 318. Dielman TW, HUll AL, Davis WK. Psychometric properties of clinical performance ratings. Evaluation and the Health Professions, 1980; 3: 103-117. 

Eisenthal S, Lazare A. Expression of patient’s request in the initial interview. Psychological Reports, 1977; 40: 131-138. 

Eisenthal S, Koopman C, Lazare A. Process analysis of two dimensions of the negotiated approach in relation to satisfaction in the initial interview. Journal of Nervous and Mental Disease, 1983; 171: 49-54. 

Fiske DW. Measuring the concepts of personality. Aldine Publishing Company, Chicago, 1971. 

Gonella JS. Evaluation of clinical competence (editorial). Journal of Medical Education, 1985; 60: 70-71. 

Joreskog KG, Siórbam D. Lisrel IV: A general computer program for estimation of linear structural equation systems by maximum likelihood methods. University of Uppsala, Uppsala, 1978. 

Katz FM. Trends in assessment (Editorial). Medical Education, 1982; 16: 61-62. 

Kerlinger FN. Foundations of behavioral research. Holt, Rinehart and Winston, Inc., New York, 1981. 

Magnussen D. Test theory. Addison-Wesley, Reading, Massachusetts, 1967. Marsh HW, Hocevar D. Confirmation factor analysis of multitrait- multimethod matrices. Journal of Educational Measurement, 1983; 20: 231-248. 

Saal FE, Downey FG, Lahey NA. Rating the ratings: assessing the psychometric quality of rating data. Psychological Bulletin, 1980; 88: 413-428. 

Schmitt N, Stults DM. Methodology review: analysis of multitrait-multimethod matrices. Applied Psychological Measurement, 1986; 10: 1- 22. Streiner DL. Global rating scales. In: Neufeld VR, Norman GR (Eds.). Assessing clinical competence. Springer Publishing Company, New York, 1985. Thorndike RL. Applied Psychrometrics. Houghton Mifflin Company, Boston, 1982.