Methods, Analyses & Results
Little attention has been paid to testing the validity of methods for measuring medical-interview performance.
We therefore decided that it was important to test both the convergent and divergent validity of the MAAS Medical Interview. To do this, we used the multi-trait, multi-method matrix. This allowed us to measure several dimensions of medical interviewing skills using several methods.
The results indicated that MAAS is the best measure of medical interviewing skills (compared with three other tools) because it shows both convergent and divergent validity. It is also minimally influenced by the measurement method.
Crijnen, A. A. M., & Kraan, H. F. (1987). Convergent and divergent validity of four methods of measurement of medical interviewing skills: a multitrait-multimethod approach. In H. F. Kraan & A. A. M. Crijnen (Eds.), The Maastricht History-taking and Advice Checklist – studies of instrumental utility (pp. 203–231). Lundbeck, Amsterdam.
Establishing convergent and divergent validity is one of the procedures suggested by Cronbach and Meehl (1955) and further elaborated by Campbell and Fiske (1959) for validating the meaning of measurements. It presumes that when two tests measure the same construct, a substantial correlation between the tests will emanate.
Validity presumes that when tests measure the same construct, a substantial correlation will emanate
Campbell and Fiske argue that demonstration of convergent and divergent validity requires that several measurements of the same construct confirm the meaning of a construct whereas, at the same time, measurements of other constructs are expected to support their distinct character.
The essential notion in Campbell and Fiske’s approach is that each measurement forms a combination of a trait and a method: measurements combine a particular content with a measurement procedure which is not specific to that content. They therefore suggest the determination of the relative contributions of trait and method variance to measurements by applying more than one trait as well as more than one method in the validation process.
Each measure forms a combination of a trait and a method
Support for convergent and divergent validity was obtained by framing a multitrait-multimethod matrix (MTMM-matrix), consisting of the correlations between a number of traits each measured by several methods. This matrix had to be examined subsequently according to four criteria proposed by Campbell and Fiske.
Several authors recommend the examination of the multi-trait, multi-method matrix as an ideal validation procedure in test development (Kerlinger, 1981; Thorndike, 1982). Although recommended, the MTMM-validation procedure has not often been applied to psychological research because of its demanding criteria.
In the present study, we investigate the convergent and divergent validity of the MAAS-MI in General Medicine:
In this section, we elaborate the construction of the multitrait-multimethod matrix. Moreover, Campbell and Fiske’s criteria and their relation to the matrix are presented. A MTMM-matrix consists of the correlations resulting when each of several traits is measured with each of several methods.
Table 1 displays an example of a MTMM-matrix for two traits (1, 2) and two methods (A, B).
Each trait is measured by each method and, subsequently, correlated.
Convergent and divergent validity are assessed by inspecting the MTMM-matrix according to the four Campbell and Fiske criteria (Campbell et al, 1959; Schmitt et al, 1986).
The first criterion refers to the convergence of independent methods with regard to the measurement of a similar trait. It states that values on the validity diagonal should be large enough to warrant further examination of validity. As a minimal requirement, correlations on the validity diagonals (rA1B1 and rA2B2) ought to be statistically significant.
The second criterion pertains to the verification of distinctions between traits. This criterion states that values on the validity diagonal should be higher than the heterotrait-heteromethod correlations of the column and row in which the individual validity value is located. This criterion can be studied by examining whether a correlation on the validity diagonal (rA1B1) exceeds the strength of the correlations on the adjoining column and row (rA1B2 and rA2B1).
The third criterion determines the extent to which method variance contributes to the scores. It states that values on the validity diagonal must be higher than the off-diagonal values in its monomethod triangle. Variables should correlate higher with measurements of the same trait obtained by different methods of measurement than with other traits measured with the same method. This criterion can be studied by examining whether correlations on the validity diagonals (rA1B1 or rA2B2) exceed the (median of the) correlation(s) of the monomethod-heterotrait block (rA1A2 or rB1B2).
The fourth criterion states that the patterns of trait inter-relationship should be the same in all heterotrait triangles in both monomethod and heteromethod blocks. Campbell and Fiske have not developed procedures to establish this criterion unequivocally.
In summary, convergent and divergent validity are ascertained when measurements behave according to the four Campbell and Fiske criteria
Over the course of time, several more advanced statistical procedures for analyzing MTMM-matrices have been developed (Marsh et al, 1983; Schmitt et al, 1986).
We therefore approach the analyses of our MTMM-matrix of measurements of medical interviewing skills by means of the original Campbell and Fiske criteria.
In the following section, we will describe:
The methods of measurement employed in the MTMM validation procedure are expected to meet several requirements:
Fiske (1971) elaborated the first requirement by stating that the procedures to obtain scores consist of a chain of events between the original behavior of the subject and the assignment of an index.
He distinguished three features in this process:
We finally achieved the construction of four independent methods of measurement of medical interviewing skills by varying the modes of measurement and the procedures of indices construction:
Each measure therefore differed as much as possible from the others. In Table 2, the modes, the indices producers and the resulting measurement instruments are depicted.
In the following paragraphs, each of the four methods is briefly characterized.
The instrument is classified as an observation of behavior mode with indices production by the researchers.
This method is classified as a self-description mode with indices production by the researchers.
This method utilizes the observation of behavior mode whereas indices are produced by general practitioners.
This method is classified as a self-description mode whereas the indices are produced by the physicians.
Campbell and Fiske require the methods used in the MTMM- validation procedure to be independent in order to minimize the influence of method covariance.
In addition to the independency of methods, Campbell and Fiske require that the traits are also independent. This requirement was posed to achieve near-zero values in the heterotrait-heteromethod and the heterotrait-monomethod triangles, and to maximize differences between the validity coefficients, the trait intercorrelations and the method influences. On reviewing these requirements, Fiske (1971) stated that one can never establish that, in an empirical sense, a trait is uncorrelated with all other traits; this mitigated the requirement for trait independency. It is therefore sufficient that traits are theoretically distinct before they are employed in a MTMM validation procedure.
The theoretical considerations leading to the classification of medical interviewing skills into six dimensions are elaborated in chapters mentioned above. We confine ourselves here to summarizing the characteristics of the dimensions.
These six dimensions constitute the elements of an appropriate initial interview in primary health care. A summary of methods and dimensions is shown.
All forty residents of the Department of General Practice who participated in the 1984-1985 residency program took part in this study. These 31 men and 9 women (mean age: 28.6 year) finished medical school at the age of 26.4. Before starting the residency program, they had worked for an average of 10 months in health care. Physicians were not selected on their interviewing skills before they were allowed to participate in the residency program. At the time of the study, 11 residents had almost completed the residency program and the others had just started their training (mean: 5 months in residency program; range: 1-11 months). During the program, the residents received 5-10 hours on average of courses in medical interviewing skills. More than half of these residents had followed the undergraduate curriculum at Maastricht Medical School (23); 11 came from Nijmegen, 1 from Utrecht, 2 from Amsterdam (UvA) and 3 from Groningen.
To secure optimal conditions for measurement, comparability and control, a simulated consultation hour was created in which 40 residents in General Practice interviewed four different simulated patients (Crijnen et al, 1986). Two weeks before the simulated consultation hour took place, residents were informed about the goal and procedures of the study.
During the simulated consultation hours, residents were asked to behave as if they had taken charge of a colleague’s practice and to perform a complete medical consultation with each simulated patient. Since physical examinations formed no part of the research setting, information about the patient’s physical condition was given to the resident on request. Six rooms with video-equipment were at our disposal in the Skills Laboratory at Maastricht Medical School.
Following a signal, observers switched on the video-equipment and a simulated patient entered the consultation room. Residents were allowed to speak for a maximum of 15 minutes with the patients.
Several months later, eleven general practitioners recruited from the Department of Family Medicine and the Skills Laboratory, were asked to observe the videotaped medical interviews and to rate the quality of the physicians’ medical interviewing skills by means of the MAAS-MI Global (instrument 3). Each interview was observed by two randomly assigned general practitioners. These general practitioners were considered to be experts in general practice because of long experience in general practice and teaching positions in the undergraduate medical curriculum or the residency General Practice Program. Clinical experience was important in order to anchor MAAS-MI General-scores to clinical reality and relevance, whereas educational experience was considered to enhance experts’ understanding of what constitutes a good medical interview.
All videotaped medical interviews were observed a second time with use of the MAAS-MI G by one of three observers. Summed scores on the scales of live and video observations were added to enhance the reliability of measurement.
The MTMM validation procedure is based on case presentations of three simulated patients presenting complaints accompanying a myocardial infarction. The myocardial infarction case, borrowed from a real case-history and documented by a general practitioner and psychologist, described a 50 year-old building contractor who was worried about his heart because he had experienced the night before a short attack of intensive chest pain. The patient had smoked for years and had recently gone through a period of severe problems as a result of the economic recession. The simulated patients were recruited from the Skills Laboratory and were instructed by a psychologist and general practitioner to present the case naturally.
Before constructing the multitrait-multimethod matrix, data were prepared for computations.
Firstly, mean scores, standard deviations and ranges were computed for each trait measured by each scale. Results are presented in Table 4.
Secondly, the multitrait-multimethod matrix was constructed by computing Pearson product-moment correlations between the six traits measured by each of the four methods. The median for each validity-diagonal, for each heterotrait-heteramethad triangle and for each heterotrait-monomethod triangle were calculated. The value between the two middle values of each validity-diagonal was considered to be the median. Results are displayed in Table 5.
To establish the first criterion, the number of significant correlations (p<.05) on the validity-diagonals was counted for bath traits and methods. Results are displayed in Table 6.
To establish the second criterion, the number of times that correlations on the validity-diagonal exceeded the strength of the correlations in the corresponding column and row of the two adjoining heterotrait-heteramethod triangles was counted. Each value on the validity-diagonal was compared to 10 other correlations. Results are displayed in Table 7.
To establish the third criterion, the number of times that the three validity-values for each trait exceeded the median of each heterotrait-monomethod triangle was counted. Results are displayed in Table 8. A second operationalization of the third criterion by comparing the validity-values with each of the correlations in the monomethod triangles separately revealed essentially the same information and is therefore not presented.
To establish the fourth criterion, we considered factor-analyzing parts of our correlation matrix. Due to an insufficient number of participating physicians, factor-analyses would yield unstable results. It was therefore decided not to study the fourth criterion.
Inspection of Table 4 reveals that, according to MAAS-MI G-scores, most physicians display only a limited number of the interviewing skills that can be displayed during a medical consultation. Furthermore, averaged MAAS-MI G-scores are lower than MAAS-MI Self-scores, and MAAS-MI Global-scores are lower than MAAS-MI Global-Self-scores. MAAS-MI G-scores are under or just above the scale midpoints, whereas scores of MAAS-MI Self, MAAS-MI Global0 and MAAS-MI Global-Self are above the scale midpoints. Standard deviations for MAAS-MI G and MAAS-MI Self and for MAAS-MI Global and MAAS-MI Global-Self are almost identical. The range of scores shows that MAAS-MI G-scales never reach the upper limits of scoring, whereas all other scales do reach the upper limits.
With regard to the first criterion, for MAAS-MI G and MAAS-MI Self, about 60% of the correlations on the validity diagonal are significant, and for MAAS-MI Global and MAAS-MI Global-Self respectively, 50% and 40% (Table 6). The validity of the dimensions Structuring and Interpersonal Skills is supported almost always; of the dimensions History-taking, Presenting Solutions and Communication Skills, only half of the time; the validity of the dimension Exploring Reasons for Encounter is never supported.
Taking into account that support by MAAS-MI Global and MAAS-MI Self is most important, we conclude that the validity of the MAAS-G-scales History-taking, Presenting Solutions, Structuring and Interpersonal Skills is confirmed; that the validity of the scale Exploring Reasons for Encounter is discredited and that the validity of the scale Communication Skills is neither supported nor discredited.
With regard to the second criterion, inspection of Table 7 depicts that Presenting Solutions and Structuring the interview, and, to a lesser extent, History-taking and Interpersonal Skills can be clearly differentiated from each other. The results support the distinct character of these dimensions. Exploring Reasons for Encounter and Communication Skills are differentiated less clearly from other dimensions.
The number of times that validity-diagonals between pairs of methods exceed the correlations of the adjoining rows and columns, shown in Table 7, discloses that MAAS-MI G/MAAS-MI Global and MAAS-MI G/MAAS-MI Self distinguish the dimensions in medical interviewing skills adequately. The other combinations appear to differentiate the dimensions less well. MAAS-MI G seems best able to discern different dimensions of medical interviewing skills. MAAS-MI Global and MAAS-MI Self are second and third, whereas MAAS-MI Global-Self is almost unable to distinguish dimensions of interviewing skills.
With regard to the third criterion, the median of the monomethod triangles, shown in Table 5, provides information on the considerable impact of the method on the measurement of medical interviewing skills.
The reported differences are amazingly great and vary from a median of .22 for MAAS-MI G to a median of .63 for MAAS-MI Global.
The results summarized in Table 8 reveal that MAAS-MI G, followed by MAAS-MI Self, are plagued least by a disturbing influence of the method of measurement. The measurement qualities of both rating scales, MAAS-MI Global and MAAS-MI Global-Self, are considered to be impaired by strong method influences. Moreover, inter-observer reliability for MAAS-MI Global, expressed in Pearson product-moment correlations between pairs of experts, is low to moderate. Correlations between experts on each item vary from .15 to .42.
Below diagonal: Pearson correlation coefficients between 4 methods and 6 traits of medical interviewing skills.
Above diagonal: Median values for validity-diagonals, in brackets and bold for heterotrait-heteromethod triangles, heterotrait-monomethod triangles.
On diagonal and in brackets: Correlation between two experts in MAAS-MI Global Expert Rating Scale.
N=33 physicians, Case: myocardial infarction; r>.29 than p<.05; r>.40 than p<.01.
The following section pertains to a discussion of the convergent and divergent validity of the MAAS-MI in General Medicine examined according to Campbell and Fiske’s criteria (1955). In addition, the validity of three other methods of measurement of medical interviewing skills are scrutinized.
Criterion 1 states that values on the validity-diagonal should be large enough to support convergent validity.
For MAAS-MI General, the confirmation of convergent validity by MAAS-MI Global is especially encouraging because it indicates that general practitioners with experience in medical education and primary health care agree with the operationalizations by MAAS-MI General-scales of the dimensions History-taking, Presenting Solutions, Structuring and Interpersonal Skills.
Validity of MAAS-MI General measures of interviewing skills is strongly confirmed
Moreover, convergent validity of these scales is underscored by physicians’ recordings of their own interviewing skills in MAAS-MI Self.
The insufficient evidence of convergent validity for the scales Exploring Reasons for Encounter and Communication Skills is disappointing and needs further elaboration. The lack of validity can be attributed:
With regard to the first reason, operationalization of the Exploration of Reasons for Encounter, one aspect is considered to be missing. In addition to eliciting information about factors in the pre-patient phase leading to the visit, patients should be asked to formulate their request for help explicitly. Item 6 in the MAAS-MI General pertains clearly to the patient’s request for help, but it is our opinion that more attention should be given to this issue because of its steering influence on content and process of an initial interview.
Eisenthal and Lazare (1976, 1983) found that interview behavior which helped the patient to put his request into words was related to feelings of being helped, of satisfaction and plan wanted. Patients find it difficult to verbalize their request for help whereas, at the same time, they consider this to be very important. A structuring activity by the physician and his collaborative involvement stimulates the patient to formulate their request for help. On the scale Exploring Reasons for Encounter, more items must focus on this issue.
A second reason for insufficient support of convergent validity is found in the characteristics of global rating scales which are considered to impair the quality of measurement. This issue is discussed with the third criterion.
Communication Skills, on the other hand, are measured unreliably by means of the MAAS-MI General which hinders determination of any form of validity (see MAAS-MI General).
For MAAS-MI Self, evidence of convergent validity is available for the scales History-taking and Structuring and, to a lesser extent, for Presenting Solutions and Interpersonal Skills. No evidence of convergent validity is obtained for Exploring Reasons for Encounter and Communication Skills.
MAAS-MI Self is a valid tool for self-evaluation of interviewing skills in medical school and residency training
The same is essentially true for MAAS-MI Self as for MAAS-MI General, but because a classical test-retest design cannot be carried out, unreliability of MAAS-MI Self has to be taken into account as a confounding influence on the validation process.
For MAAS-MI Global, the validity for History-taking, Presenting Solutions, Structuring and Interpersonal Skills is confirmed; the validity of Exploring Reasons for Encounter is discredited, and the validity for Communication Skills is neither supported nor discredited.
With regard to the measurement characteristics of global rating scales, it is known that raters are unable to assess more than two dimensions of performance accurately. In medical education, physicians discern most a problem-solving and interpersonal-skill dimension, which largely agrees with the results presented here (DieIman et al, 1980; Streiner, 1985).
For MAAS-MI Global-Self, convergent validity of Interpersonal Skills is unequivocally supported by the validity coefficients. Apparently, a physician’s experience of their interpersonal skills displayed during the interview agrees with the impression of MAAS-MI General-observers and experts. This is of importance, because it confirms the validity of an important but difficult to measure quality of a medical interview.
MAAS-MI Global-Self is well-able to measure interpersonal skills in a medical consultation
Since convergent validity of the other dimensions is only supported by strong correlations with MAAS-MI Self and not by MAAS-MI General or MAAS-MI Global, we conclude that the validity of global self-rating scales of medical interviewing skills has to be questioned with the exception of measures of interpersonal skills .
MAAS-MI General, MAAS-MI Self and MAAS-MI Global display evidence of convergent validity for History-taking, Presenting Solutions, Structuring and Interpersonal Skills. Insufficient evidence was obtained to support convergent validity of the Exploration of Reasons for Encounter and Communication Skills.
For MAAS-MI Global-Self, convergent validity is obtained for the measure of Interpersonal Skills, whereas the validity of the other measures is discredited.
To support divergent validity with regard to the dimensions of interest, criterion 2 states that values on the validity-diagonal should be higher than the values of the corresponding column and row in the heterotrait-heteromethod triangle. Campbell and Fiske’s goal was to verify a method of measurement’s capability of distinguishing the dimension of interest from several other dimensions. They required the median of each heterotrait-heteromethod triangle to approach zero in order to enhance determination of divergent validity. The median values, shown in Table 5, reveal that none of them approaches zero, which suggests that the methods and/or the dimensions are related. We expected this to occur because we were not able to construct totally independent methods of measurement and because the theoretical dimensions which were discerned in medical interviewing skills will be related to some extent.
The results, shown in the right-hand column of Table 7, depict that the dimensions Presenting Solutions and Structuring and, to a lesser extent, History-taking and Interpersonal Skills, are clearly differentiated from each other. These results support the distinct character of the dimensions and underscore the theoretical considerations that led to the differentiation of medical interviewing skills into six distinct dimensions. Once again, Exploring Reasons for Encounter and Communication Skills are less well discerned due to low correlations on the validity diagonals.
The combination of MAAS-MI General/MAAS-MI Global and MAAS-MI General/MAAS-MI Self distinguishes the different types of medical interviewing skills most adequately as is shown in Table 7. The other combinations of methods differentiate the dimensions less well. MAAS-MI General thus appears to be the best able to discern different types of medical interviewing skills.
MAAS-MI General discerns different types of medical interviewing skills best: History-taking, Presenting Solutions, Structuring the interview and Interpersonal Skills are distinct types of medical interviewing skills, Exploration of Reasons for Encounter and Communication Skills are harder to distinguish
MAAS-MI Global and MAAS-MI Self are second and third, whereas MAAS-MI Global-Self is almost unable to discern dimensions with the exception of Interpersonal skills.
Four dimensions of medical interviewing skills that were discerned theoretically and used to construct the MAAS-MI General-scales can be distinguished empirically. History-taking, Presenting Solutions, Structuring the interview and Interpersonal Skills are distinct types of medical interviewing skills.
Difficulties arise in distinguishing the dimensions referring to the Exploration of Reasons for Encounter and Communication Skills.
The third criterion was formulated to secure optimal measurement of the dimensions because every psychological measurement device is characterized by features that are specific to the dimension of interest and other features which are characteristic for the method being employed. Criterion 3 states that values on the validity diagonal must be higher than the off-diagonal values in the monomethod triangle. Since the process of measuring always elicits irrelevant method variance, measurements are considered to be invalidated to the extent that method variance contributes to the scores obtained.
A look at Table 8 reveals that features of the measurement process impinge strongly upon the scores obtained with MAAS-MI Global and MAAS-MI Global-Self, whereas a smaller influence of the method is observed for MAAS-MI Self and MAAS-MI General. Since the interview behavior on which the data are based was similar for all methods, the differences can be attributed to the methods that were employed.
The third criterion undeniably discloses the difficulties that arise in psychological measurement.
Of all methods, MAAS-MI General demonstrates the best measurement properties because it evokes a low degree of method variance and shows considerable correlations on the validity-diagonals.
Of all methods, MAAS-MI General demonstrates the best measurement properties
Once again, Exploring Reasons for Encounter and Communication Skills are measured improperly and are therefore primarily responsible for the failure of MAAS-MI General on the third criterion. As we performed a generalizability study, discussed in MAAS-MI General, we know that Exploring Reasons for Encounter in particular is measured fairly reliably with low levels of method variance. Furthermore, one of Campbell and Fiske’s requirements is that each of the methods employed should measure the dimensions as conceptualized appropriately. It is therefore our opinion that the lack of success of the Exploration of Reasons for Encounter on the third criterion can be partly attributed to the failure of the other methods to measure this dimension properly.
MAAS-MI Self displays more method-variance when compared to MAAS-MI General, and less when compared to MAAS-MI Global and MAAS-MI Global-Self. A look at Table 4 shows that the mean of each MAAS-MI Self scale is considerably higher than the mean of identical MAAS-MI General scales. This interesting finding demonstrates that interviewing physicians believe that they perform more facets of interview behavior than they actually do. We have often had the experience in examination situations of noting that medical students mixed up information given by the patient with their own questioning behavior: when physicians received information, they often thought they had asked for it. This induces unreliability in MAAS-self measures. In conclusion, we remark that influences of the self-description mode applied in MAAS-MI Self are likely to interfere negatively with the measurement of the dimensions.
It is evident that MAAS-MI Global is affected strongly by the method of measurement, which consists of observation of behavior and, subsequently, a rating by experts. Although a considerable influence of the method was expected to occur, we were surprised by the strength of the halo-effect.
MAAS-MI Global is strongly plagued by method variance, especially halo-effects, but also by leniency and central tendency
Halo:
We asked general practitioners with experience in both general practice and medical education to participate in this study, especially because they were supposed to be able to distinguish the occurrence and quality of different medical interviewing skills. However, even experts experience difficulties in discerning dimensions in medical interviewing skills when no clearly-worded and well-defined items are available.
Even experts experience difficulties in discerning interviewing skills when no well-defined and clearly-worded items are available
Ratings of distinct types of interviewing skills displayed during a medical consultation are reduced to a judgment about a problem-solving and an interpersonal-skills dimension (Dielman et al, 1980; Streiner, 1985).
A second type of method influence, so-called leniency, a rater’s tendency to assign a higher or lower rating to a subject’s behavior, also appears to occur because all averaged ratings of MAAS-MI Global are above the midpoint of the scales. Most experts use the positive part of the scale continuum.
Restriction of range, the third type of method influence finally, seems to take place because most raters do not use the extreme ends of the scales. The negative side in particular is almost never used. We therefore conclude that strong halo-effects, leniency and central tendency, are all likely to impede the measurement properties of the MAAS-MI Global.
The influence of the method, especially of halo-effect, on MAAS-MI Global-Self is considered to come close to MAAS-MI Self, because the median of the monomethod triangle is near the median for MAAS-MI Self. We expected the influence of halo in MAAS-MI Global-Self to approach halo in MAAS-MI Global, because both global rating scales have the feature in common that the behavior of interest is not well defined. It seems that physicians who are in the actual interview situation experience more differentiation than experts.
Moreover, leniency is suspected of influencing MAAS-MI Global-Self strongly, because averaged ratings on MAAS-MI Global-Self are significantly above the midpoint of the scales, leading to a decrease in the amount of variance. Interviewing physicians rate the quality of their own interview behavior more positively in comparison with observers’ ratings. Restriction of range, finally, seems to take place because categories on the negative side of the scales are almost never used.
In conclusion, we observe that halo, leniency and restriction of range in particular, are inclined to diminish the measurement properties of the MAAS-MI Global-Self. Since MAAS-MI Global-Self utilizes one item which is not well-defined to represent each dimension, this method of measuring medical interviewing skills is considered to be highly unreliable.
Campbell and Fiske constructed this criterion in order to determine the major sources of variation in measurements and in order to conclude that enough trait variance was measured to sustain optimal measurement. These precautions were taken to secure the process of measurement. With regard to our study, the third criterion undeniably discloses the difficulties that arise in psychological measurement: halo, leniency and restriction of range appear to occur in varying degrees in our measurements, but MAAS-MI General appears to be least influenced by method-variance.
The fourth criterion states that the patterns of trait inter-relationship should be the same in all heterotrait triangles in both monomethod and heteromethod blocks in order to provide evidence for divergent validity. Satisfaction of this criterion would suggest that the underlying traits are really correlated, whereas failure of this criterion would imply that the observed correlation between traits assessed by a given method is due to a method or halo bias (Marsh et al, 1983). The interpretation of the fourth criterion has posed problems for us and for several other researchers because Campbell and Fiske did not operationalize it. Some authors have merely mentioned the fourth criterion in a publication but have not applied it to their data (Marsh et al, 1983). Other authors have considered this criterion to be too strict and therefore unrealistic (Magnusson, 1966).
In our study it is unrealistic to interpret a correlation matrix consisting of 288 correlations. Each interpretation can be refuted by other correlations which will then suggest a different explanation. Furthermore, (parts of) the correlation matrix cannot be factor analyzed because of the small number of students (see also MAAS-MI Mental Health). We therefore decided not to apply the fourth criterion to our correlation matrix. As this decision was taken, it can be concluded that no clear pattern of interrelations between the dimensions was observed in our data and that method influences are likely to interfere in the strength of the correlations but this had already been revealed during the interpretation of the third criterion.
The convergent and divergent validity of the MAAS-MI General, in addition to the validity of three other methods of measurement of medical interviewing skills, is studied by means of the multi-trait, multi-method matrix. In the multi-trait, multi-method matrix, several dimensions in medical interviewing skills are measured with several methods. The resulting correlation matrix is scrutinized by means of four criteria which were developed by Campbell and Fiske (1959).
For the MAAS-MI General, the convergent validity of History-taking, Presenting Solutions, Structuring and Interpersonal Skills is clearly warranted by the strength of the correlations, whereas the Exploration of Reasons for Encounter and Communication Skills fail to provide evidence of convergent validity.
Essentially, the same conclusions can be drawn for a self-evaluation variant of the MAAS-MI G and the MAAS-MI Global Expert-Rating Scale, whereas for the MAAS-MI Global Self-Rating Scale, insufficient evidence for convergent validity is obtained with the exception of a measurement of Interpersonal Skills.
Divergent validity of dimensions in medical interviewing skills is established for History-taking, Presenting Solutions, Structuring and Interpersonal Skills. Difficulties arise in distinguishing dimensions referring to the Exploring Reasons for Encounter and Communication Skills. Furthermore, MAAS-MI G appears to be the most effective in discerning dimensions, followed by MAAS-MI Global and MAAS-MI Self, whereas MAAS-MI Global-Self is unable to distinguish dimensions. However, Exploring Reasons for Encounter may well be poorly distinguished because of inability of the global measures.
Moreover, MAAS-MI General displays the best measurement properties because it evokes only low degrees of method-variance when compared to other methods. Halo, leniency and restriction of range are inclined to diminish the measurement properties of MAAS-MI Global and MAAS-MI Global-Self, and partly of MAAS-MI Self.
MAAS-MI General displays the best evidence of convergent and divergent validity
All in all, MAAS-MI General appears to be the best method of measurement of medical interviewing skills because it displays evidence of convergent and divergent validity, and is minimally influenced by the method of measurement.
Campbell DT, Fiske DW. Convergent and discriminant validation by the multi-trait multi-method matrix. Psychological Bulletin, 1959; 56: 81- 105. Cronbach LJ, Meehl PE. Construct validity in psychological tests. Psychological Bulletin, 1955; 52: 281-302.
Crijnen AAM, Thiel J van, Kraan HF. Evaluatie van consultvoering: een spreekuur nagebootst (Evaluation of a medical consultation: simulating consultation hours). Huisarts en Wetenschap; 1986: 29: 316- 318. Dielman TW, HUll AL, Davis WK. Psychometric properties of clinical performance ratings. Evaluation and the Health Professions, 1980; 3: 103-117.
Eisenthal S, Lazare A. Expression of patient’s request in the initial interview. Psychological Reports, 1977; 40: 131-138.
Eisenthal S, Koopman C, Lazare A. Process analysis of two dimensions of the negotiated approach in relation to satisfaction in the initial interview. Journal of Nervous and Mental Disease, 1983; 171: 49-54.
Fiske DW. Measuring the concepts of personality. Aldine Publishing Company, Chicago, 1971.
Gonella JS. Evaluation of clinical competence (editorial). Journal of Medical Education, 1985; 60: 70-71.
Joreskog KG, Siórbam D. Lisrel IV: A general computer program for estimation of linear structural equation systems by maximum likelihood methods. University of Uppsala, Uppsala, 1978.
Katz FM. Trends in assessment (Editorial). Medical Education, 1982; 16: 61-62.
Kerlinger FN. Foundations of behavioral research. Holt, Rinehart and Winston, Inc., New York, 1981.
Magnussen D. Test theory. Addison-Wesley, Reading, Massachusetts, 1967. Marsh HW, Hocevar D. Confirmation factor analysis of multitrait- multimethod matrices. Journal of Educational Measurement, 1983; 20: 231-248.
Saal FE, Downey FG, Lahey NA. Rating the ratings: assessing the psychometric quality of rating data. Psychological Bulletin, 1980; 88: 413-428.
Schmitt N, Stults DM. Methodology review: analysis of multitrait-multimethod matrices. Applied Psychological Measurement, 1986; 10: 1- 22. Streiner DL. Global rating scales. In: Neufeld VR, Norman GR (Eds.). Assessing clinical competence. Springer Publishing Company, New York, 1985. Thorndike RL. Applied Psychrometrics. Houghton Mifflin Company, Boston, 1982.