4.1Content Validity in Mental Health Confirmed

This study addresses the content validity of MAAS Mental Health. Does MAAS adequately support the medical interviewing skills needed for primary mental health care?

According to mental health experts, good interviews are characterised by well-developed interpersonal and communicative skills, a thorough exploration of the reasons for encounter, and a well-structured exchange. These are all key components of MAAS.

Experts also favour a basically non-directive interviewing style, interspersed with periods of directive and systematic data-gathering to generate and test hypotheses. Again, such an approach is enabled by MAAS Mental Health.

Kraan, H. F., & Crijnen, A. A. M. (1987). Content validity and the MAAS-Mental Health. In H. F. Kraan & A. A. M. Crijnen (Eds.), The Maastricht History-taking and Advice Checklist – studies on instrumental utility (pp. 279–303). Lundbeck, Amsterdam.

Examining content validity

The core question in this study of content validity reads as follows:

Is the content concerning – initial – medical interviewing skills in mental health care adequately operationalized in the MAAS-Mental Health items? 

Unfortunately, no procedure for empirical quantitative assessment of content validity has been recognised in the literature (De Groot, 1961). Therefore, we propose an indirect procedure for the study of content validity shown in Table 1.

This procedure runs as follows:

  • Starting-point is the construction of the MAAS-Mental Health:
  • Scoring of a set of residents’ interviews with the MAAS-MH by trained observers with the assessment of item reliabilities by generalizability coefficients.
  • Scoring of the same set of interviews by experts who use the MAAS-MI Global Expert-Rating Scales measuring the same dimensions of interviewing which also underlie the MAAS-MH. Item reliabilities are calculated as Pearson correlations between pairs of experts.
  • Comparison of both sets of scores by correlating the MAAS-scores on the scale level with the corresponding items of the MAAS-MI Global Expert-Rating Scale.
  • Inspection of the residents’ scores on the MAAS-MI MH for the representativeness of the original theoretical concepts in their scoring patterns.

Although all steps of this procedure should be checked for their correctness, we carry out this check on 3 points. The most important step in this procedure, however, the operationalization of the content of interviewing into items, cannot be investigated directly. When these checks turn out to be positive, it supports the content validity of the MAAS-MI MH. 

In checking these steps, the three following questions are asked (indicated by their number in Table 1): 

  • What interviewing skills are used by residents in general practice?
    • How representative is our educational model of interviewing skills as operationalized in the MAAS-MI MH for this group?
    • These questions are answered by studying the interviewing skills residents in general practice display during the simulated consultation. By this approach, content validity on the item level is investigated. 
  • Does each of the items pertaining to a single interviewing skill show sufficient reliability?
    • To answer this question from the perspective of content validity, the sources of unreliability should be explored. Are the theoretical concepts as described in MAAS & Related Skills adequately operationalized, resulting in agreement between observers or is the item construction insufficient in a technical sense, thus acting as a source of error?
    • These questions are answered by using a generalizability design which allows the study of the different components of variance in the MAAS-MI MH-scores on the item level.
  • Do the scales of the MAAS-MI MH reflect the theoretical dimensions of initial interviewing in mental health care?
    • In a correlative study, MAAS-MI MH-scores are compared with the ratings by experts over the same interviews.
    • In this approach, content validity research takes place on the scale level.
Table 1 -- Procedure to Study the Content Validity of the MAAS Mental Health
Schermafbeelding 2021-05-06 om 13.45.04

Content Validity & Physicians Interview Behavior in Mental Health

A contribution to content validity of the MAAS-MH emerges from the scoring patterns of residents in general practice. These patterns of scoring provide a picture of the interviewing skills of this group. 

The contribution of these scoring patterns to the content validity is based on two rather self-evident arguments:

  • In their under-and postgraduate education, the residents have been made more or less familiar with the theoretical aspects of interviewing in Primary Mental Health Care (See also Meeting Patients).
  • In their present postgraduate training, the residents have to cope at a primary care level with the mental health problems of their patients. 

Method

We studied the frequency of positive MAAS-MH item scores of the residents in both cases of the simulated consultation hour. A picture of their interviewing skills thus evolves. As a criterion for infrequently used interviewing skills, we have taken the mean minus 1 standard deviation as a cutting point. Cutting points of one standard deviation above or below the mean are common in group referenced educational measurement as used at Maastricht University Medical School. This implicates that the cutting point for infrequently used interviewing skills is below the p-value of 0.16. 

Results

In Table 2 can be seen which of the percentage scores (p-values) of the physicians’ interviewing skills are below 0.16. 

In general, all interviewing skills are used in the scale Exploring Reasons for Encounter except for questions about coping with problems in the past and about the impact of the problems on others.

The History-taking scale is also generally supported, but the two important items about the factors, maintaining problems and functionality (gains), are under-used. 

In the scale Psychiatric Examination, scores on examination of disturbances in consciousness, thought, perception and memory are lacking, because these items have barely been used in the consultations of the residents with simulated patients (see also Scalability & Reliability). 

In the scale Socio-emotional Exploration, many items (11 out of 18 Rasch homogeneous items) show low percentage scores. Conspicuous are the few questions asked about aggressive, affectionate and religious feelings.

Physicians ask few questions about aggression, affectionate and religious feelings

In addition, questions on caring, responsibility, substance (ab)use, sexual functioning, housing and financial situation and developmental issues are scarce. This finding shows that residents do not adhere strongly to the preferred theoretical style of initial interviewing in primary mental health care which recommends a balanced combination of a general, non-directive style, with directive or systematic questioning. 

The scale Presenting Solutions shows low percentage scores in some interviewing skills which have to do with the negotiation process: conveying information about pros and cons of the treatment plan, discussion of differences in problem-definition and inviting the patient to make a choice from different treatment alternatives. These findings might shed some light on the imperfect style of negotiation of residents in general practice. 

In the scale Structuring, the items on the consultation plan and on the check whether the most important problems have been discussed are under-used. This scale is otherwise generally supported. 

The same holds in general for the scales Interpersonal Skills and Communicative Skills. The interviewing skills operationalized in these items are generally used by the residents. Only items about the feelings of the patient during the consultation and about meta- communicative comments are under-used. 

Conclusion: Content Validity Supported by General Practitioners Interview Behavior

In their 15-20 minutes interviews with simulated patients, residents show a variability in their interviewing behavior which generally covers the item domain of the MAAS-MH. In this way, the content validity of the MAAS-MH is supported by the interviewing pattern of the residents in general practice. 

There are, however, some exceptions:

  • Such as items concerning hypotheses-generation and functionality of the problems which are under-used during History-taking;
  • In addition, a considerable number of items from the scale Socio-emotional Exploration, particularly emotion-related items, are under-used;
  • Finally, the items on the negotiation process in presenting solutions are infrequently scored.

The last two findings may be due to the fact that the simulated situation does not invite the student to explore emotional topics or to negotiate with the patient. 

These minor deviations from our educational model seem to point to deficiencies in the interviewing styles of residents, or to restrictions of the simulated situation, rather than to the lack of content validity of the MAAS-MH. 

Content Validity & Item Reliability

When an item is unreliable, the theoretical content covered is not appropriately represented in the measurement method. The consequence is theoretical loss during measurement. Although these statements show the relationship between reliability and content validity, item reliabilities studies are not a comprehensive check on the process of operationalization of theory into items. It is one step in the study of content validity (see Table 1).

Sources of Unreliability

A generalizability design enables the detection of the sources of unreliability in items. In the MAAS-scores, variance components may be discerned such as variance:

  • Due to the physicians’ skills,
  • Due to the observers’ interpretation of the items, and
  • Due to interactive effects between physicians and observers.

The latter two variance components are to be considered as sources of unreliability. Moreover, it is important to note that we have used Rasch homogeneous scales. 

In the conclusion of this section, we summarize the content validity on the item level including the theoretical loss already suffered by exclusion of items during Rasch analysis to obtain homogeneous scales. 

Method

  • Firstly, we computed the generalizability coefficients over each of the 104 items of the MAAS-MH. We have already justified this type of reliability research on the item level (see also Scalability & Reliability.) We again used the same design as described in that chapter, where 20 videotaped interviews (10 for each case), all rated by 6 observers, provide the data for generalizability analysis.
  • Secondly, an analysis of variance was carried out where physicians and observers were taken at random, but each item was, of course, fixed. In this way, 3 components of variance originate concerning, respectively, the physicians’ ability, the observers’ interpretation and their interaction component, including error.
  • Thirdly, we inspected the generalizability coefficients and the variance components caused by the observers. This last component has to do with systematically different interpretations of items by observers of each item. This observer component was compared with its p-value in Table 2. This source of unreliability is more serious when the item is regularly used (cutting point p-value .16 as in Table 2).

All tables are presented at the end of this chapter.

  • Finally, we considered the theoretical loss caused by the removal of unreliable items. As a minimum of reliability, we took a generalizability of 0.35, a criterion given by Mitchell (1979) calculated over 2 observers, as we base the validity studies of the MAAS-MH on MAAS-MH-scores resulting from summed ratings by two independent observers. 

Results

Scrutinizing Table 2 for items showing a generalizability coefficient lower than 0.35, we discuss the theoretical loss that takes place when these items are removed from the Rasch homogeneous scales. 

Exploration of the Reason for Encounter

In the scale Exploration of the Reason for Encounter, 6 out of 13 items would be dropped, amongst which are reason for visit, thorough exploration of the complaints and symptoms (from the patient’s frame of reference) and exploration of the emotional impact of the problem upon the patient himself and his important others. In these items, the variance component attributed to the observers is substantial (>20%). The size of this component may explain its low generalizability coefficients. Furthermore, the items about the consequences of the problems on daily life and the item about problem solving and coping in the past are unreliable. Some of these items represent a considerable loss of information: the emotional impact of the problem and its significance for daily life. This loss is harmful to the completeness of patient-centered information, a major aim of this phase. 

History-taking

In the scale History-taking, the theoretical loss mainly concerns the exploration of the conditions pertaining to aetiology (factors that have increased, decreased or that have maintained the problem/complaint) and factors that have to do with the functions e.g. gains of the complaint. With this loss, this scale becomes somewhat vulnerable in measuring the data collection by the physician in order to generate explanatory and treatment hypotheses. 

Psychiatric Examination

In the scale Psychiatric Examination, evaluation of the content validity aspects by means of reliability criteria is hampered by the restrictions of our experimental situation. Since our patients did not present clinical pictures with marked disturbances in perception, thought, memory and consciousness, some of the items pertaining to these symptomatology show very low to none variance. Consequently, analysis of variance could not be performed to provide data for generalizability analysis. Finally, 5 out of 9 items would be removed from this scale, among which are important theoretical items pertaining to aforementioned symptomatology. We do not know the consequences of this loss. Reliability of these items should be assessed in interviews pertaining to this symptomatology. However, these findings do not immediately imply that the aim of this scale i.e. measurement of the thoroughness of psychiatric examination, is not attained. The restricted number of unreliable items shows that this item format is promising for evaluation of the thoroughness in psychiatric examination. 

Socio-emotional Exploration

In the scale Socio-emotional Exploration, the exclusion of unreliable items is substantial (7 out of 18). The theoretical loss mainly concerns emotional aspects (exploring feelings of aggression, responsibility, caring and future expectations) and, to some extent, biographical data collection. This unreliability correlates with the low frequency with which these topics have been raised by the physician during the interview and may be due to threshold problems in scoring (see previous paragraph). 

Presenting Solutions

The scale Presenting Solutions, although almost entirely Rasch homogeneous (13 out of 15 items), suffers from a general unreliability on the item level. The subsequent theoretical loss pertains to both functions dealt with during this phase of the interview: firstly, the conveyance of information on causal conditions of the problem, the rationale and pros/cons of the treatment plan and, secondly, the negotiation between physician and patient about problem definition, treatment methods and goals. Five items of this scale show a considerable (›.20%) observers’ variance component, indicating a definition disagreement between observers which can be corrected. 

Structuring

In the scale Structuring the interview, a considerable theoretical loss has already taken place during Rasch analysis (3 items out of 8). The low reliabilities on the item level in particular concern the items on the structuring in the sequence of the phases in the initial interview. In analogy with the previous scale, it may be concluded that experts agree on some structuring of the phases, e.g. the introduction and closure of topics, but disagree on the fixed sequence of the phases Exploration of the Reason for Encounter, History-taking, Psychiatric Examination, Socio-emotional Exploration and Presenting Solutions

Interpersonal Skills and Communicative Skills

The scales Interpersonal Skills and Communicative Skills both suffer from a deficient reliability on the item level, especially the latter. Although both scales are entirely Rasch homogeneous, in the scale Interpersonal Skills, seven out of 10, and in the scale Communicative Skills, six out of 7 (!) items would be removed because of unreliability. 

Looking closer, strikingly low reliability is found in items pertaining to facilitative behavior, such as proper history-taking and proper pacing of the interview (interpersonal skills), concretising, conveying information in small units, checking of understanding, proper confrontations and comprehensible language (communicative skills). Also conspicuous is the high number of items with a substantial observers’ variance component (in interpersonal skills: 4 and in communicative skills: 3) again witnessing definition and criterion problems. 

We end this section with two more general observations:

  • Firstly, we notice that when the averaged generalizability of the Rasch homogeneous items are compared with the generalizability coefficients on the scale level, then several discrepancies are noted. The coefficients on the scale level are higher in comparison with the averaged coefficients on the item level, except Exploration for the Reasons for Encounter, Structuring the interview and Communicative Skills
  • Secondly, there are many regularly used items with substantial (>16%) observers’ variance components in the scales Exploration of the Reason for Encounter (5 out of 13), Presenting Solutions (7 out of 13), Structuring the interview (2 out of 5), Interpersonal Skills (5 out of 10) and Communicative Skills (4 out of 7). 

Discussion

Reviewing the items in the scales after a removal of non-Rasch homogeneous and unreliable items (see Table 3 for an overview of the remaining items), we reassess the content validity on the scale level: 

  • In the Exploration of the Reasons for Encounter scale, there is theoretical loss in interviewing behavior pertaining to the emotional aspects of the problems and their significance for daily life. The loss of emotional aspects from the interviewing behavior is even more marked in the scale Socio-emotional Exploration.
  • In the History-taking scale, a qualitatively important loss concerns the items measuring the physicians’ interviewing behavior serving clinical problem-solving.
  • The scale Psychiatric Examination has promising features as to the measurement of thoroughness in exploration of symptomatology, though some loss took place due to artefacts in the experimental situation.
  • The findings concerning the Presenting Solutions scale are somehow ambiguous. The reliability on the item level is often low, but the scale has a considerable Rasch homogeneity evidencing an increase in consistency and stability when one moves from the item level to the more abstract scale level.
  • Structuring the interview shows a low to moderate reliability on the item level. The reasons for this may be two-fold:
    • First, the phases in the interview are more difficult to distinguish in mental health. We have already argued that the distinction line between doctor- and patient-centered information is difficult to draw, as is the line between History-taking and Socio-emotional Exploration. In general, interviewers use a more variable, less structured interviewing style.
    • Second, observers might not adhere to our operationalization of the items and induce a source of unreliability.
  • Finally, the scales Interpersonal Skills and a fortiori Communicative Skills suffer from a low to moderate reliability on the item level although both are Rasch homogeneous. We have given an explanation for this discrepancy.

We conclude this section with three more general remarks:

  • Firstly, most scales, except Exploration of the Reasons for Encounter, Structuring the interview and Communicative Skills, show an upward jump in reliability when going from the item to the scale level. Apparently, observers adhere more strongly to our concepts of interviewing on the broader and more globally defined scale-level than on the stricter item level.

Reliability jumps upward when going from the item- to the scale-level

  • Secondly, we have seen that a considerable number of items in the scales measuring process skills suffer from unreliability caused by a high observer variance component. This finding, based on systematic observers’ biases, is repairable by improving item definitions and criteria, as well as ameliorating the observers’ training.
  • Thirdly, we repeat once more that with this reliability approach to content validity study, we are not able to infer conclusions about the quality of the operationalization of concepts into items. All statements made so far pertaining to a lack of content validity or theoretical loss, may also be explained by deficits in operationalization which we cannot detect directly, as we argued at the beginning of these paragraphs.

Content Validity & Experts’ Opinion

The question is whether, according to the opinion of experts, the MAAS-MH scales reflect initial interviewing in mental health care? The procedure of construction of the MAAS-MH by a core group of experts has been described in MAAS MI Construction. The content validity of the MAAS-MH is now studied by comparing MAAS-MH-scores with global expert-ratings of the same interviews. 

Method

An expert panel, different from the core group of experts involved in the construction of MAAS-MH, evaluated the set of 80 videotaped interviews which had been scored previously by trained observers with the MAAS. The expert panel consisted of 4 psychiatrists, 1 third-year resident in psychiatry, 1 andrologist, 2 social workers and 1 psychiatric nurse. They were all experienced in primary mental health care and in the practical training of undergraduate students and residents. 

To evaluate the videotaped interviews, the panel used the MAAS-MI MH Global Expert-Rating Scale, a 9-item, evaluative 5-point Likert scale (see Instruments).

  • The items are very globally defined, asking an opinion about subjects’ exploration of the reason for encounter, history- taking, presenting solutions, structuring the interview, interpersonal and communicative skills.
  • History-taking is measured by two items: History-taking to generate and test explanatory hypotheses and History-taking to generate treatment hypotheses. In this way, experts are invited to evaluate history-taking from a diagnostic and a therapeutic point of view.
  • This instrument ends by asking for an overall evaluation of the interview.

With this MAAS-MI MH Global Expert-Rating Scale:

  • The experts rated 80 videotaped interviews (40 residents in general practice each interviewing 2 simulated patients). Each of these 80 interviews were rated by 2 experts randomly drawn from the panel, yielding 160 rated interviews.
  • The 80 videotaped interviews were also scored twice. The first time, live during the interviews; the second time, a few months later from the videotaped interviews.

In order to reduce the observers’ source of unreliability, the MAAS-MI MH Global Expert-Rating of both experts, as well as the scores of both observers with the MAAS-MH, have each been summated. 

The correlations between the summated MAAS-MH scores on the scale level and the summated MAAS-MI MH Global Expert-Ratings are studied to assess to which degree the experts support the MAAS-MH-scores. The magnitude of the correlations serves as a measure of content validity. Special attention is given to the question of whether experts actually make the theoretical distinction between interpersonal and communicative skills. In addition, we examine the issue of what experts have in mind when they consider an interview to be good.

We especially examined whether experts in mental health distinguish interpersonal from communicative skills 

However, before studying these validity coefficients, we shall make some remarks about the MAAS-MI MH Global Expert-Rating Scale in terms of its inter-rater reliability and internal consistency. First, we studied the inter-rater reliability by calculating the Pearson’s correlation between the item scores of the randomly combined pairs of experts who rated the 80 videotaped interviews. Second, a Cronbach alpha for the Global Rating Scale was calculated for the 160 rated interviews. 

Results

The reliability figures of the MAAS-MI MH Global Expert-Ratings (Table 4) are Pearson’s correlations between the item scores of randomly combined pairs of experts. To simplify the analyses, we combined in the MAAS-MI MH Global Expert-Rating Scale both history-taking items (explanatory and treatment hypotheses) to form one item. These reliability figures are moderate, but one has to take into account that the reliabilities are always calculated over one item; a hard condition. The reliability figures for the items History- taking, Presenting Solutions and Structuring the interview are reasonable. The Cronbach alpha of the whole scale is 0.79, which is good for a 7-item rating scale. 

The good coefficient alpha and the moderate inter-rater reliability of this method contrasts with each other. It is proof of high method covariance in the items, indicating that differences between the traits will not be accurately measured. 

Next we turn to Table 5 showing the correlation matrix between MAAS-MH scales and the MAAS-MI MH Global Expert-Ratings of the same traits. In the MAAS-MH, we have combined the scales History-taking with those of Psychiatric Examination and Socio-emotional Exploration to correlate both methods better with each other. 

  • Experts give support to the combined History-taking dimension of the MAAS-MH (r=.42).
  • They also support the scales Interpersonal Skills and Communicative Skills (resp. r=.27 and r=.25), and Structuring (r=.22), but they fail to do this with the scales Exploration for the Reason for Encounter and Presenting Solutions.
  • Moreover, the experts’ notion of Exploration of the Reasons for Encounter correlates with the MAAS-MH trait of Interpersonal Skills and Communicative Skills (resp. .27 and .33).
  • In contrast, the Exploration of the Reason for Encounter of the MAAS-MH bears no correlation at all with the experts’ trait of Interpersonal and Communicative Skills.
  • The Exploration of the Reason for Encounter of the MAAS-MH also shows a substantial correlation with the experts’ trait of History-taking (.29). 

Interpersonal & Communicative skills

Whether experts support the theoretical distinction between interpersonal and communicative skills (Hess, 1969), is studied in Table 6.

  • It shows that experts support the scales pertaining to the combination of interpersonal- and communicative skills as measured by the MAAS-MH (0.42).
  • However, experts do not make the theoretical distinction between interpersonal and communicative skills which has been elaborated in The Medical Interview & Related Skills and which has been operationalized in the MAAS-MH. Their validity coefficients (resp. 0.27 and 0.25) are about the same as the correlations of communicative- and interpersonal skills measured by the different methods (0.18 and 0.29).

The question of which characteristics of the interview (in terms of MAAS-MH scales) experts have in mind when they consider the initial interview as good, is answered in Table 7

It turns out that the experts’ overall evaluation of the interview correlates significantly, though moderately, with the MAAS-MH scales History-taking, Socio-emotional Exploration, Interpersonal and Communicative Skills. No significant correlations are found with the scales Exploration of the Reason for Encounter, Psychiatric Examination, Presenting Solutions and Structuring the interview

The question of which characteristics experts have in mind themselves when they consider an initial interview to be good, is addressed in Table 8. These correlations, which are generally substantial (.36 to .67), suggest that, according to the experts’ opinion, the quality of good initial interviewing in primary mental health care is mainly based on good Interpersonal and Communicative Skills and on the characteristics Exploration of the Reasons for Encounter and Structuring the Interview.

According to experts in Mental Health, good interviews are characterised by  Interpersonal and Communicative Skills as well as by Exploration of Reasons for Encounter and Structuring

Discussion

Restrictions of global measures

In this section, we compare the experts’ evaluations of interviewing skills with the MAAS-MH scales. Since this evaluation has been measured with the MAAS-MI Global Expert-Rating Scale, we have to take its restrictions into account. Although there is a wide-spread belief in the validity of expert judgment in medical competence, we have found in our MAAS-MI Global Expert-Rating Scale its necessary condition, i.e. its inter-rater reliability, to be moderate. In this instance, the validity of experts’ judgment in interviewing should be taken with some reservedness. Even experts fail to agree on when items are ill-defined or have no clearly described criteria for rating. These findings agree with Streiner’s (1985) comments on global-rating scales. Reliability and validity of global-rating scales is hampered by halo-effects, idiosyncratic use by raters, and by a restriction of being able to measure no more than two dimensions. 

Reviewing the correlations between MAAS-MH scores and expert ratings in the light of these shortcomings, experts moderately support the MAAS-MH on the scale level.

Experts in Mental Health support the content validity of MAAS-MI Mental Health scales

Disagreement between experts and MAAS’ constructers

However, some remarks should be made. Between experts and the constructors of the MAAS, a conceptual difference can be noted in the scale Exploration of the Reasons for Encounter. From our theoretical stance, this concept pertains to patient-centered information necessary to clarify the request for help, i.e. the way the patient wishes to be helped to fulfil his needs in seeking professional help (Lazare et al., 1975).

However, experts consider Exploration of the Reason for Encounter, as we operationalized it in the MAAS-MH, as an extension of History- taking, aiming to collect patient-centered information. This finding is also supported by the substantial correlation between the Exploration of the Reason for Encounter-dimension measured by the MAAS-MH and the History-taking-dimension of the MAAS-MI Global Expert-Ratings. 

On the other hand, the experts’ own concept of the Exploration of the Reason for Encounter is significantly correlated with the MAAS-MH operationalizations of the Interpersonal and Communicative Skills; in other words, with factors enhancing an interview climate of trust and acceptance in order to promote a mutual exchange of information.

In our view, this discongruence is caused by the unclear sense of the concept request for help amongst experts who, however, frequently use the term, apparently to denote same of the afore-mentioned factors in the interview: the request for help means the most appropriate way the patient desires his problems to be solved or needs to be fulfilled.

Don’t forget to explore the request for help, meaning the way the patient desires his problem to be solved or fulfilled 

Interpersonal and Communicative Skills & A Good Interview

In addition, it is remarkable that the theoretical relevant distinction between Interpersonal and Communicative Skills (Hess, 1969) is not made by experts, although in the MAAS-MI Global Expert-Rating Scale this distinction has been clearly defined. It is a pity that our concepts of Presenting Solutions and Structuring the interview are not strongly confirmed. This finding may be due to the low to moderate reliability of these MAAS-MH scales on the item level. 

Finally, we find some empirical support for the generally propagated style of good initial interviewing in primary mental health care (see also Medical Interview & Related Skills). This style which is, in principle, patient-centered, allows the patient to tell his story in his own words, while the interviewer follows in a non-directive way. This pattern should be interrupted with periods of a more structured, systematic method of questioning when hypotheses are to be tested. These periods are directive and physician-centered. In the correlations between the experts’ overall evaluation of the interview with their own global ratings and with the MAAS-MH ratings of the different traits, we find empirical support for this statement.

In Mental Health, patient-centered interviewing should be interrupted by periods of more structured and systematic questioning to generate and test hypotheses 

When experts consider the initial interview as well-conducted, they have in mind the dimensions History-taking in a broad sense (physician-centered, directive) as well as Interpersonal and Communicative Skills (non-directive, patient-centered).

Conclusions on Content Validity in MAAS-MI Mental Health

Content validity of the MAAS-MH has been investigated on item and scale levels. 

Content validity of the MAAS is difficult to study:

  • A direct investigation into whether the item domain of the MAAS-MH is representative for initial medical interviewing in primary mental health care is not possible.
  • Moreover, the quality of the operationalization of content and process of medical interviewing skills into items is almost impossible to assess. 

We approximated content validity by means of a three-step procedure:

  • Firstly, we investigated the score profiles of residents in general practice with the MAAS-MH in mental health interviews. These score profiles generally support the content validity.
  • Secondly, we studied which theoretical losses the MAAS-MH suffered from unreliability on the item level.

The reliability of items measuring interviewing skills which pertain to process aspects (interpersonal and communicative skills, the ability to structure the interview and to present solutions), is rather low. Due to this lack of reliability, the MAAS-MH suffers some theoretical and conceptual loss in the measurement of the process aspects of the interview. Generalizability analysis of the items pertaining to interviewing skills from each of the three phases of the interview is satisfactory. Theoretical loss is consequently restricted to the items of the scales Structuring, Interpersonal Skills and Communicative Skills. Fortunately, reliability on the scale level is proportionally much higher, except for Exploration of the Reasons for Encounter, Structuring and Communicative Skills. This finding suggests that observers endorse our theoretical concepts on the scale level rather than their operationalizations in items.

  • Third, we compared experts’ judgements of important dimensions of medical interviewing with the scores on MAAS-MI scales intended to measure the same dimensions. Experts support the MAAS-MH on the scale level with the exception of the scales Exploration of the Reasons for Encounter and Presenting Solutions

Our conceptualization of the scale Exploration of the Reason for Encounter is taken for an extension of history-taking with the collection of patient-centered information, rather than for the exploration of the request for help

Experts favor basically non-directive interviewing interrupted by periods of directive and systematic data-gathering to generate and test hypotheses

The experts’ support in content validity also seems in favor of a generally propagated interviewing style, basically non-directive, interrupted by periods of more directive and systematic data-gathering in order to generate and to test hypotheses.

Table 2 -- Generalizability Coefficients on the Scale Level, Item Generalizabilities, Observers Variances and Score Patterns by Residents in General Practice
Schermafbeelding 2021-03-18 om 16.47.47
Table 2 Cont'd
Schermafbeelding 2021-03-18 om 16.57.09
Table 2 Cont'd
Schermafbeelding 2021-03-18 om 17.06.08
Table 2 Cont'd
Schermafbeelding 2021-03-19 om 09.31.54
Table 2 Cont'd
Schermafbeelding 2021-03-19 om 09.45.07
Table 2 Cont'd
Schermafbeelding 2021-03-19 om 09.59.18
Table 2 Cont'd
Schermafbeelding 2021-03-19 om 10.08.01
Table 2 Cont'd
Schermafbeelding 2021-03-19 om 10.11.14
Table 3 -- Review of numbers of items per MAAS-scale after Rasch analyses and removal of unreliable items
Schermafbeelding 2021-03-19 om 11.44.15
Table 4 -- Inter-rater reliability of the MAAS-MI Global Expert-rating Scale by calculating Pearson’s correlations between randomly combined pairs of expert-raters
Schermafbeelding 2021-03-19 om 11.48.25
Table 5 -- Correlations between the scores on 6 MAAS-MI MH Scales with the corresponding items on the MAAS-MI Global Expert Rating Scales
Schermafbeelding 2021-03-19 om 11.51.54
Table 6 -- Correlations Between Interpersonal and Communicative Skills and their Combination Measured By the MAAS-MI MH with The Same Traits Measured By the MAAS-MI Global Expert Rating Scale
Schermafbeelding 2021-03-19 om 11.20.56
Table 7 -- Correlations of experts’ evaluation of the interview as a whole with the traits measured by the MAAS-MI MH
Schermafbeelding 2021-03-19 om 11.54.25
Table 8 -- Correlations of the experts’ overall evaluation of the initial interview with their own measurements of the different traits
Schermafbeelding 2021-03-19 om 11.54.25

References

Groot AD de. Methodologie. Mouton, ‘s-Gravenhage, 1961. 

Hess JW. A comparison of methods for evaluating medina] student skills in relating to patients. Journal of Medical Education, 1969; 44: 934-938. 

Lazare A, Eisenthal S, Wasserman L. The custarer approach to patienthood. Attending to patient requests in a walk-in clinic. Archives of General Psychiatry, 1975; 32: 553-558. 

Mitchell SK. Interobserver agreement, reliability and generalizability of data collected in observational studies. Psychological Bulletin, 1979; 86: 376-390. 

Streiner DL. Global rating scales. In: Neufeld VR, Norman GR (Eis.). Assessing clinical competence. Springer Publ. Cie., New York, 1985.