1.4MAAS Medical Interview Construction

While putting together MAAS medical interviews for general and mental health, we arrived at two goals for our interview skills model:

  • The skills should form an ideal model of the initial interview  
  • The model could be used in education for instruction, self-evaluation and assessment.

We searched in vain for such a model in the literature and decided to develop MAAS medical interview.

Here we describe methods of measurement for evaluating MAAS medical interviews for general and mental health, followed by other MAAS variants. Finally, we offer practical tips on using MAAS medical interview.

Kraan, H. F., & Crijnen, A. A. M. (1987). The construction of the Maastricht History-taking and Advice Checklist. In H. F. Kraan & A. A. M. Crijnen (Eds.), The Maastricht History-taking and Advice Checklist: studies on instrumental utility (pp. 81–118). Lundbeck, Amsterdam.

+  +  +

How are medical interviewing skills measured?

To be more precise, this question is rephrased as: Which methods and which categorizations of interviewing skills are preferably measured? 

Behavioral Assessment 

As a method for the measurement of interviewing skills, we have chosen for a behavioral assessment by external observers, because of its good perspectives with regard to reliability, scalability and validity. 

Behavioral assessment has been successfully applied to personality research (a.o. Mischel, 1968) and to research into behavioral modification (a.o. Cóne et al., 1977). The superiority of behavioral assessment over other methods, such as self-assessment and global ratings of behavior, has frequently been investigated (see for a discussion a.o. Beekers, 1982; Streiner, 1985). The MAAS, subject matter of this thesis, has therefore been constructed as a behavioral assessment method which is operationalized in rather simple, clearly defined and delimited units of behavior. 

MAAS-MI is constructed as a behavioral assessment operationalized in simple, clearly defined and delimited units of behavior

In addition to this method of behavioral assessment of interviewing skills, several other methods are constructed that are used in forthcoming validity studies:

  • Two self-assessment methods of interviewing skills are introduced:
    • MAAS-MI Self: an assessment of detailed, circumscribed interviewing behavior;
    • MAAS-MI Self-Global: a global self-rating scale.
  • MAAS-MI Global: a global expert rating scale used by experts to evaluate the physician’s interviewing skills.
  • Patient Satisfaction with Communication Checklist, an evaluation method used by interviewees (patients).

We cover all possible measurement methods of interviewing skills, which are strongly related to the MAAS in the theoretical sense, with these main types in our validity research:

  • External observation (behavioral assessment, global rating);
  • Self-assessment (behavioral assessment, global rating);
  • And assessment by the interviewees (patients).

Categorization of Interviewing Skills 

Let us now return to the categorization of interviewing skills. We restrict ourselves to the categorization within the behavioral assessment method which we prefer for the construction of the MAAS.

Distinguishable Categories

Five main observable categories of interviewing skills can be distinguished (a.o. Stiles et al., 1986): 

  • Content categories pertaining to what is said (the semantic content). Content categories range from particular topics of interest (e.g. topics about a specific medication or specific complaints) to grouped topics (e.g. general categories, such as somatic or psychosocial complaints). 
  • Speech-act categories concerning the acts performed when someone says something as opposed to the content of his words. Example: How long have you been out of work? is a question about the patient’s employment situation, but the act performed by the speaker, is the asking of a question and the eliciting of an answer from the patient. 
  • Non-verbal communicative behaviors, e.g., voice tone, gaze, posture, laughter, hesitation, facial expression. 
  • Ratings of affect and evaluative ratings of complex interviewing skills:
    • In the former, the rater judges the emotional tone in (a part of) the interview;
    • In the latter, the rater judges how well a complex interviewing skill has been performed. 
  • Conjunctive categories which combine one or more elements of the above-mentioned categories into one single category. For instance: Asking questions about the perception of the complaints is an example of a ‘conjunctive category’, combining content and speech act categories. 

After listing observable categories, we take a closer look at how these categories are sampled from medical interviews.

Coding & Rating

Stiles and Putnam (1986) distinguish two types of sampling: coding and rating.

  • Coding

Coding involves the use of nominal scales to categorize bits of content (semantic elements) or bits of behavior (e.g., open questions; closed questions).

    • This coding can be complete when every utterance, sentence, interviewing skill, semantic element (whichever the category noted) during the interview is scored.
    • Coding is incomplete when proceeding from certain criteria for scoring, only predefined bits of content or behavior are selected and coded; for instance, the closed questions in the interview and not the other types of questions.

The distinction between complete and incomplete coding is, however, always relative, because coding can never be complete in a philosophical sense. When incomplete coding is used, then threshold problems in scoring may arise (Rutter et al., 1981; Stiles et al., 1986). Even when observers acknowledge the coding criteria of a particular type of interviewing behavior or content element, they may have difficulties in agreeing on whether specific behavior is occurring or not. Example: raters may agree on the criteria by which expression of empathy should be recognized. The problem starts when raters have to agree on whether the amount of empathy in a pertinent expression is sufficient in order to be a codable expression of empathy (Rutter et al., 1981). 

  • Rating

Rating is an attempt to quantify a quality of interviewing behavior (for instance: affective behavior). Likert-scales are often used for rating. 

Measuring Interviewing Skills: Our Choice

The first four categories of the Stiles and Putnam list are mainly coded (complete or incomplete), whereas the fifth category is by definition, rated on (Likert-type) scales. In our view, Stiles and Putnam’s meta-classification provides a useful overview of which measurement categories can be encountered in the literature on measurement of interviewing skills. We would like, however, to include as much as possible in their speech-act category the more complex skills specific to medical interviews. In this way, we can thus list complex interviewing skills, such as summarizing, reflection, confrontation etc. in this speech-act category. It is, however, necessary to use rating scales, especially when the quality of complex interviewing skills is to be evaluated.

Nevertheless, a measurement-method serving educational objectives should state in precise behavioral terms how these complex skills should be performed. 

A measurement-method serving educational objectives should state in precise behavioral terms how these complex skills should be performed

Those categories appropriate to our purposes are indicated in Table 1 which presents the matrix of possible measurement categories of interviewing skills. 

Measurement of interviewing skills during the phases Exploring Reasons for Encounter, History-taking and Presenting Solutions asks for conjunctive categories of content and of speech-acts, whereas the skills for Structuring as well as Interpersonal Skills and Communicative Skills mainly require affective and evaluative ratings. From the five aforementioned main categories of observable interviewing behavior we do not use non-verbal behavior, because of difficulties in operationalization.

Table 1 -- Overview of Observable Categories of Medical Interviewing Skills
Schermafbeelding 2021-02-25 om 16.56.34

Criteria for measurement of interviewing skills

The requirements for the measurement of interviewing skills, which are based on the previous discussions, is summarized in a list of 6 criteria. While using these criteria, a literature review of existing methods is subsequently carried out.

  1. The method should be observational with the following selected categories of interviewing skills: 
    • Content categories specific for initial medical interviews in general practice and in primary mental health care;
    • Speech-act categories, pertaining to simple and complex interviewing skills;
    • Affective and evaluative ratings to complex interviewing skills;
    • Conjunctive categories of speech-act and medical content elements. 

For reasons of practicability, incomplete coding is used. 

  1. The focus should be on the physician’s interviewing skills. 
  2. The method should measure interviewing skills which:
    • Can be taught effectively, i.e. those which are susceptible to behavior modification;
    • Should also be suitable as a feedback tool in education. 
  3. The method should guarantee reliable measurement. 
  4. Besides content validity, the method should have construct validity. 
  5. Practicability of the method:
    • Reasonable test length and scoring time,;
    • Clear definitions and criteria for scoring and for training of observers;
    • Handy lay-out of instrument and a user-friendly manual.

Literature-search on methods for measuring interviewing skills

Before constructing the MAAS-MI General and the MAAS-MI Mental Health, we reviewed the literature while hoping to find a method satisfying the above-mentioned criteria. In this section, these methods are discussed and conclusions drawn on our their testing against our pre-set criteria.

Discussion of measurement-methods encountered in the literature 

In the literature we found 22 methods which we summarized in Table 2 at the end of the chapter. 

The discussion of the 22 methods, to which we refer by means of the first authors’ names, takes place according to our criteria preset in the previous paragraph. 

More

  1. Are the methods and categorizations suitable? 

The majority of methods used in non-evaluative research of the medical interview is based on the Interaction Process Analysis developed by Bales (1950).

Bales originally developed the Interaction Process Analysis for assessment of interactions in small groups. It has since been applied to physician-patient communication. The physician’s and the patient’s interviewing behavior are taken equally into account. As an application of the Interaction Analysis in physician-patient communication, we present Roter’s (1977) modification of Bales’ system. 

As shown in Table 3 presented at the end of this chapter, eight mutually exclusive interactional categories for physicians as well as for patients are stipulated. The differences with Bales’ systems are an extension of the speech-act categories (e.g., personal remarks, given direction, bids for clarification) and the addition of four affective rating scales: anger-irritation; sympathy-kindness; anxiety-nervousness; matter-of-factness-professionalism. 

Interaction Analysis instruments use time-consuming and cumbersome complete-coding systems

Such ‘interaction analysis’ instruments are widely used in research into the medical interview. Their meticulous, complete coding of ‘molecular’ speech-acts enables the researcher to follow closely the process of information-exchange between physician and patient. They have however several disadvantages: 

  • They use very time-consuming, cumbersome complete coding (such as the methods of Adler (1966), Hess (1969) and MacDonald (1981)). 
  • They do not measure complex skills, such as summarizing, confrontation, various types of reflection, self-disclosure, checking of information etc. which are characteristic of medical interviews. The specifics of these complex interviewing skills cannot be retraced by means of these coding systems. 
  • In the same vein, feedback to interviewers about their interviewing skills is restricted. 
  • Medical content aspects concerning history-taking, diagnosis, prognosis, treatment etc. are not taken into account. 

To compensate for some of these disadvantages, some researchers (e.g., Freemon, 1971) add checklists of content categories in these instances (Sprunger, 1983). 

More in accordance with our measurement objectives, the instruments of Hollifield et al. (1957), Van Dorp (1977), Stillman (1980), Hill (1981), Goldberg et al. (1981) use speech-act categories which reflect realistically the character of medical interviewing skills. Moreover, the more practical incomplete coding is used. 

These methods also have their shortcomings: 

  • Threshold problems which are inherent in the use of incomplete coding; 
  • Reliability problems with rating scales; 
  • Absence of medical content aspects. 

Therefore, some of the instruments add medical content elements, for instance, constructing conjunctive categories such as the methods designed by Jarrett (1972), Brockway (1978), Barsky (1980), Rutter (1981), Mumford (1984). 

A special position is taken by the “Dutch” instruments of Mokkink (1982), Den Hoed (1982) and Pieters (1982). They all use evaluative, rating scales and pay considerable attention to interviewing skills but, in addition, also measure other competency domains of the physician, such as medical problem-solving and perceptual and interpretative skills and attitudes. Although they are comprehensive evaluation methods for initial medical consultations, they are not specific for the competency domain of medical interviewing skills.

2. Is the measurement applicable to initial interviewing in General Practice and Mental Health Care?

Content of medical interviews can be typified along two axes:

  • The first axis is the medical discipline of the interviewer, e.g., general practice/primary care, psychiatry, pediatrics;
  • The second axis becomes clear when one considers the negotiated consensus model of Lazare (1975). Two sub-types of interviews can be derived from this model:
    • Initial interviews where the patient’s request is elucidated and further planning for responding to this request is discussed;
    • Follow-up interviews where the physician actually responds to the patient’s request. 

Examples of measurement methods of initial interviews in primary care settings are those of Barsky (1980), Mokkink (1982), Den Hoed (1982), Pieters (1982). Fteemon’s (1971) and Sprunger’s (1983) instruments pertain to pediatrics. 

Several methods are designed for initial interviews in primary mental health care: Jarrett’s (1972), Hill’s (1978), Rutter’s (1981) and Goldberg’s (1981) instruments. 

Some evaluative remarks about these methods: Rutter’s method is very elaborate, but its content checklist pertains to child psychiatry. Hill’s method is especially designed for evaluative counseling sessions. Jarrett’s and Goldberg’s methods are rather restricted in their variety of interviewing skills to be measured.

3. Is the focus on the physician’s interviewing skills?

In general, the Interaction Analysis methods pay the same attention to the physician’s as to the patient’s communicative behavior. Methods designed for evaluative purposes obviously focus exclusively on the physician’s interviewing skills, such as those of Hollifield (1957), Hess (1969), Van Dorp (1977), Barsky et al. (1980), Rutter (1981), Goldberg (1981), Mokkink (1982), Den Hoed (1982), Pieters (1982), Sprunger (1982) and Musford (1984).

4. Are teachable interviewing skills measured?

Among the instruments we reviewed, there is only a small number that exclusively measure teachable interviewing skills and that are suitable for the provision of immediate feedback to students about their interviewing behavior.

Only a few instruments measure teachable interviewing skills that provide immediate feedback

This criterion is fulfilled when the methods are operationalized in rather simple, clearly-defined and delimited interviewing behavior. It only concerns the methods of Brockway (1978) and, to a lesser extent, those of Jarrett (1972), Rutter et al. (1981) and Sprunger (1983). 

Moreover, in Table 2, it is indicated which methods have actually been used to measure effects of teaching programs, witnessed by reports in the literature. Not all methods reviewed under this heading have been used in this sense.

5. Are reliability studies available? 

In our literature review, the majority of instruments encountered have been tested for reliability: mainly concerns the inter-rater reliability. Although a host of procedures has been used, such as Pearson’s correlation between raters, weighted Kappa’s, intra-class correlations and percentage agreement, it is a pity that this latter measure, which is the weakest in the methodological sense, has been the most applied by far. Perhaps, therefore this is why the reported reliabilities so frequently vary from sufficient to good.

6. Are validity studies available? 

Our literature review reveals a scarcity of validity studies. With the Interaction Analysis instruments (Bales, 1950; Freemon, 1971; Roter, 1977) and the Verbal Response Mode of Stiles (1978), studies have been performed that evidence predictive validity: the physicians interviewing behavior has been proven to explain some variance in outcome variables, such as patient’s satisfaction and recall of medical information (Inui et al., 1982).

Stillman et al. (1977) have carried out validity research with their ACIR-scale:

  • Convergent validity: the ACIR is able to measure a predicted growth in interviewing skills
  • Divergent validity: the ACIR does not measure medical or scholastic aptitude.

Swanson (1981), has correlated the scores of the ACIR-scale (Stillman, 1980), with a modification of Hess’ instrument (1969) and with the History and Physical Exam Checklist (Swanson et al., 1981). He concludes that it is impossible to find evidence for construct validity, because the low inter-case reliability does not allow comparison between instruments.

In Instrumental Utility Assessed, our desiderata according to validity research is elaborated further.

7. Are the methods practicable?

In most descriptions of the methods used in educational evaluation of interviewing skills, remarks are made about the burden of the scoring and of the training of observers. In this respect the methods of Barbee (1967), Hess (1969), Barsky (1980), StilIman (1980), MaCDonald (1981), Mokkink (1982), Den Hoed (1982), Pieters (1983), Sprunger (1983) and Mumford (1984) look feasible. 

The Interaction Analysis instruments have, however, a rather time- consuming and cumbersome manner of scoring, because of their complete coding. Moreover, the interview often has to be transcribed in a verbatim protocol before scoring can take place.

Which methods are applicable to our measurement criteria? 

To answer this question we apply our criteria in three successive steps to the reviewed sample. The methods which withstand this procedure, meet our measurement requirements.

First, we select methods by applying the criteria ‘observation method and preferable categories of interviewing skills’, ‘measurement of teachable interviewing skills’ and ‘practicability’. Only those methods remain which measure simple and complex interviewing skills, include content categories and are practicable, i.e., use incomplete coding and/or rating scales. The methods of Barbee, Jarrett, Brockway, Barsky, Rutter, Den Hoed, Sprunger, Pieters and Muurford fulfil these criteria. 

Second, several of the remaining instruments fit the requirements of initial interviews in Primary Mental Health Care (Jarrett and Rutter) and General Practice (Barsky, Den Hoed and Pieters). All these instruments also fit the requirement of ‘focus on the physician’s interviewing behavior’. 

Third, we examined instrumental utility:

  • The scarcity of validity studies, especially in construct validity, is striking, and evidences a low tendency to investigate underlying theories of medical interviewing.
  • Reliability is often investigated, but the research is not very sophisticated. The weak measure of percentage agreement is often used, whereas more robust methods, such as generalizability analysis and probabilistic scale analysis (Thorndike, 1982), have never been reported.
  • None of the methods meets the criterion of ‘available reliability and validity studies’ satisfactorily. 

We conclude, therefore that none of the reviewed instruments fits our pre-set criteria precisely 

The consequence of this review entails the construction of a new measurement-method for interviewing skills in General Practice and also one for Mental Health problems. Construction of a new instrument means, on the one hand, referring back to the theoretical digressions of chapters one and two and on the other hand, learning lessons from the useful experiences of the reviewed authors. 

Construction of the MAAS-Medical Interview

According to Thorndike (1982), it is desirable to draw up a test plan when a measurement-method is constructed. These recommendations are briefly described as follows: 

  1. Initial definition of the competency domain the method is designed to assess. 
  2. Description of the use of the method (type of subjects, type of decisions on which to base it). 
  3. Constraints within which the method must operate. 
  4. Design of a blueprint: assembling content specifications (topics to be covered, skills to be tapped).
  5. Specification of the format of the items (nature of stimulus materials, type of response to be made, procedure for scoring). 
  6. Plan for try-out of the proposed method, for analyzing the try-out data and for selecting items for the final method. 
  7. Specification of the statistical parameters desired in the finished method. 
  8. Outlining further data-collection and analysis for further reliability and validity studies. 
  9. Organization of the test manual and other auxiliaries.

 Checking these steps, we must conclude that much preparatory work has already been done:

  • Steps 1 and 2 are described in Medical Interview & Related Skills;
  • Step 3 is discussed – as far as it is applicable – in this chapter;
  • Steps 4 to 6 are dealt with below:
    • We first outline the procedure of how the blueprint and the final version of the MAAS-G and MAAS-MH have been constructed;
    • We then describe the scales of MAAS-MI G and MAAS-MI Mental Health;
  • Steps 7 and 8 are the subject matter of Instrumental Utility;
  • Step 9, the manual for observers, can be found on this site MAAS-MI G and MAAS-MI MH.

Steps involved in construction: MAAS-MI General

An initial blueprint was designed by a founding panel consisting of two general practitioners, two psychologists and one social psychiatrist. It was based on the theoretical knowledge of process and content in initial medial interviews as described in Medical Interview & Related Skills.

After several revisions, this first blueprint resulted in a second 56-item checklist with 4 scales for assessment of the physician’s interviewing behavior:

  1. Exploration of the reasons for encounter;
  2. Structuring the interview;
  3. Quality of basic interviewing skills;
  4. Designing a treatment plan.

This second blueprint was tested by the panel using this method: they rated videotaped initial interviews with patients presenting somatic and mental health problems. This testing resulted in:

  • Removal of ambiguously worded items;
  • Developing definitions of interviewing skills and criteria for scoring;
  • Clearer item formulation;
  • Extension and improvement of medical content elements in the items.

A third blueprint was the result of these extensions and improvements. It already revealed the present format of the MAAS with the six scales. It was decided that the method was to reflect the three characteristic phases of initial interviews.

The following are the first 3 scales:

  1. Exploration of the reasons for encounter;
  2. Medical history-taking;
  3. Presenting solutions. 

The following 3 scales represent the physician’s process skills in initial interviews:

  1. Structuring the interview;
  2. Interpersonal skills;
  3. Communicative skills.

This third blueprint was presented by a broader, expert panel of general practitioners, psychiatrists, psychologists and sociologists, all faculty staff charged with education and evaluation of interviewing skills.

These pretests consisted of the following procedures:

  • The expert panel was invited to comment on the item domain, the item format, the definitions of the interviewing skills and the criteria for scoring described in the manual for observers;
  • The expert panel was asked to score 2 test-videotapes of interviews of about 20 minutes: one simulated a patient presenting a somatic problem, another simulated a mental health problem. The panel members were expected to discuss their scores and to attain a consensus in two sessions. During these sessions, guided by 2 members of the founding panel, comments and scoring problems were monitored.

These sessions resulted in several adaptations of the third blueprint which finally became MAAS-MI General.

Steps involved in construction: MAAS-MI Mental Health

The third blueprint of the MAAS-General was taken as a starting point. The founding panel then constructed items mainly based on the content aspects of initial medical interviews in mental health care. This content dimension is described in Medical Interview & Related Skills. This procedure resulted in the addition of 2 content- specific scales, Socio-emotional Exploration and Psychiatric Examination, to the original 6 scales of the MAAS-MI G. 

The resulting blueprint was judged by another broader panel consisting of about 15 experts in the field of primary mental health care (social psychiatrists, psychologists, social workers, general practitioners). Moreover, these experts are also educators and researchers in their roles as faculty members. Their efforts resulted in adaptations of the item domain, resulting in the final version of the MAAS-MI MH. The items of 8 scales of the MAAS-MI MH are extensively described in the following paragraphs. 

Item description MAAS-MI General 

In this section we describe the 6 scales of this method:

  • Exploration of Reasons for Encounter
  • History-taking
  • Presenting Solutions
  • Structuring
  • Interpersonal Skills
  • Communicative Skills. 

The theoretical material which furnishes the item domain is derived from Medical Interview & Related Skills, where the process and content dimensions of these interviewing skills are described.

The items of the 6 scales and their criteria for scoring are described under Tools

Exploration of Reasons for Encounter 

In this phase, the physician gives the patient the opportunity to describe his complaints and symptoms in his own words. He expands on the causes and consequences of the complaints and the events which triggered the visit to the physician. Further questions may be asked about attempted solutions and about discussions of the complaints in the primary group. 

The appropriate process aspects in this phase are open questions, probes into the patient’s frame of reference, active listening, emotional reflection, stimulating summarizations. These process skills, summarized in the term ‘exploration’, are assessed in the scales Interpersonal Skills and Communicative Skills (see below). 

The items in the scale Exploration of Reasons for Encounter belong to a conjunctive category, combining speech-acts and content elements. 

The items are edited in a format such as Asks for … (content topic) or Explores … (content topic)

Scoring is on a two- point scale: present or absent. The scoring present should be given, when – according to the criteria specified in the manual for observers – the pertinent topic is asked or explored. 

History-taking 

During this phase of the interview, the physician asks the patient questions from his medical frame of reference in order to collect information for his diagnostic and clinical reasoning process.

The content of the items reflects questions about aspects of the complaints /problems: description of the nature of the complaint, intensity, localization, course through day-time, etc. Psychosocial factors are also operationalized in the items: questions about psychological functioning, quality of interpersonal relationships etc. The 22 items of this scale belong to a conjunctive category of speech- act (mainly types of questions) and content elements. 

The process skills used in this phase are mainly closed and directive questions, sometimes in a short series. These process skills are measured on the scales Interpersonal Skills and Communicative Skills.

The scoring is on a two-point scale: present or absent. Format and scoring of the items is similar to the previous scale.  

Presenting solutions 

This phase follows both previous phases and – if carried out – the physical examination.

  • The physician informs the patient about his condition or problem, causes and prognosis of his disease.
  • He then proceeds with an exploration of the patient’s feelings, evoked by this information.
  • A negotiation of the problem definition between physician and patient may ensue.
  • The physician then makes a proposal for follow-up: further exploration or investigations, referral, treatment, preventive advice. Alternative proposals may be given by the physician and again negotiation may follow.
  • Finally, the physician gives concrete advice based on the outcome of the negotiation process.
  • The physician concludes with appointments for follow-up. 

The 12 items describing the skills during this phase, belong to a conjunctive category of speech- act and content elements and may take various formats: Provides information about … (medical information), Discusses … (medical information), Explains the effects of … (medical information), Explains why … (medical information). 

The scoring is on a two-point scale: present or absent

Structuring 

This scale, comprising of 8 items, is intended to measure the skills by which the physician opens and closes the interview, by which he sets an agenda, and by which he links the aforementioned three phases. These items consist of conjunctive categories combining a complex interviewing skill with content elements. Example: Begins the Presenting Solutions phase with the provision of information on the problem definition or diagnosis

The scoring is on a two-point scale: present or absent

Interpersonal Skills and Communication Skills 

These skills, which are not connected to a specific phase, are not easy to operationalize in concrete behavior and to furnish with criteria, which result in reliable scoring. 

Interpersonal skills are operationalized in 8 items. These items pertain in particular to those interviewing skills by means of which the physician approaches the emotional aspects of the interview. 

The items of the scale Communication Skills are rooted in the interviewing skills by means of which the physician starts and maintains the information-flow from and to the patient.

The format of these items is a three-point evaluative rating scale. In general, in every item, a number of criteria should be fulfilled. Items are scored with No, Indifferent and Yes in proportion to the number of criteria fulfilled. 

These criteria, which can be either qualitative or quantitative, yield two types of items:

  • Qualitative Criteria

These items are scored by application of each criterion to the whole interview. The number of criteria positively fulfilled determines the ultimate score: No, Indifferent and Yes

For instance, the item on Facilitation has 5 global criteria, judging the interview as a whole, as to: 

  • Quality of the open questions 
  • Presence of active listening 
  • Quality of probing within the patient’s frame of reference 
  • Facilitative self-disclosure 
  • Minor, stimulating remarks.

The quality of each criterion is judged according to the effect on the facilitation of the patient to tell his own medical story. The ultimate scoring is Yes when 4 or 5 criteria are fulfilled; Indifferent in the case of 2 or 3 criteria and No in the case of fewer than 2. This kind of scale calibration is called ‘behavioral anchoring’. It is claimed that it enhances reliability in global rating scales (Streiner, 1985).

  • Quantitative Criteria

These items are scored in a manner best illustrated by an example: e.g. the item Uses closed-ended questions in a proper way. Every closed-ended question is judged against the criteria in the manual, resulting in a count of a proper or improper closed-ended question. The score Yes, Indifferent or No is given when resp. 80% or more, 60-80% or 60% or less of the closed-ended questions are used in a proper way. 

Item description MAAS-MI Mental Health 

Broadly speaking the MAAS-MI Mental Health consists of the MAAS-MI General extended by 2 scales: Psychiatric Examination and Socio-emotional Exploration. These scales contain items which are content-specific for primary mental health care. The theoretical base of these items is stated in Medical Interview & Related Skills.

Moreover, some new items are added to the other scales and some items have been reallocated from one scale to another. Format and scoring of items is similar to those used in the MAAS-MI G. we therefore only briefly discuss the 8 scales of the MAAS-MI MH (see for the items and the criteria for scoring the Instruments). 

Exploring Reasons for Encounter 

In comparison with the MAAS-MI General this scale has been extended by 5 items to a total of 13 items.

  • First, the item Asks the patient to describe his complaints/problems has been re-allocated from the scale History-taking. The reason for this reallocation lies in the fact that in mental health problems’ symptoms, complaints and problems are – in the patient’s perspective- strongly interwoven. Separation is often considered as rather artificial.
  • An item about ‘recent life events’ has also been added. This gives an impression of the intensity of stressful ‘life events’.
  • In addition, two items concerning the patient’s problem-solving or coping mechanisms and his wishes regarding future changes have been constructed.
  • Finally, an item on the impact of the patient’s problem complaint on members of his primary group has been added.

History-taking 

This scale has been reduced to a total of 13 items. The items with psychosocial content have been relocated to the new scale Socio- emotional Exploration. This scale, in combination with the following two scales Psychiatric Examination and Socio-emotional Exploration are for measuring the extent to which the physician more or less systematically scans the psychosocial domain in order to generate explanatory and action hypotheses with respect to the patient’s problem.

Psychiatric Examination

This scale has 6 composite items which reflect the physician’s interviewing skills pertaining to the collection of information for the psychiatric examination, i.e. the symptoms and signs level.

These six composite items cover the important psychiatric diagnostic groups in mental health care:

  • Affective disorders
  • Anxiety related disorders
  • Disturbances in consciousness and orientation
  • Disturbances in memory
  • Disturbance in sensory perception
  • Disturbances in thought. 

The scoring of these items are elucidated by the following example: 

Explores anxiety 

  1. Character and intensity:  Yes No
  2. Anxiety (fear) of objects: YesNo;
  3. Anxiety provoking or releasing factors: YesNo;
  4. Consequences of anxiety: YesNo

A total score of the ‘sub-items 1-4’ represents a measure of the depth to which the symptom anxiety has been explored. 

Socio-emotional Exploration 

This scale reflects a broad area of socio- emotional functioning. The items pertain to:

  • The emotional functioning of the patient;
  • Norms and values in taking responsibility;
  • Exploration of relationships in family or primary groups;
  • Social support;
  • Functioning in profession and education;
  • Cultural conflicts;
  • Financial and housing situation;
  • Substance (ab)use;
  • Developmental issues until adolescence.

The 20 items are scored on a 2-point scale: YesNo. They require a certain exploration in depth – according to the criteria stated in the manual – in order to be scored with Yes.

Presenting solutions 

This scale has 15 items and is an extended version of the similar scale in the MAAS-MI G. Four new items have been added: 

  • Concerning the degree of responsibility the patient will take for his treatment;
  • Concerning the patient’s opinion of the proposed help;
  • Concerning the impact of important others on the proposed help;
  • Concerning the opportunity the physician has provided for making a choice between proposed alternative solutions.

One item has been removed from the similar scale of the MAAS-G, because its content is covered by the newly included items. These extended items arise from the concept of psycho-role (Siegler et al., 1976). 

Structuring

These scales are the same as those used in the MAAS-MI G. Their similarity is based on the discussion in Medical Interview & Related Skills

Interpersonal Skills and Communicative Skills

These scales are the same as those used in the MAAS-MI G. Their similarity is based on the discussion in Medical Interview & Related Skills

Variants and extensions of MAAS-MI G and MAAS-MI MH 

Starting from the item domain of the MAAS-MI G and MAAS-MI MH, we also constructed methods which can be used by the interviewer as a self- evaluation method, which can be used by experts, and which measured the patient’s responses in the communication.

The objectives of their construction are threefold:

  • First, such methods may serve to provide feedback during education.
  • Second, they can be used for evaluative purposes.
  • Third, in validity research, we would like to answer the question whether the same underlying concepts of interviewing skills are measured by both the interviewers themselves and observers alike.

The following instruments were constructed:

  • The MAAS-MI SELF for behavioral self-assessment;
  • The MAAS-MI Global Self-Rating Scale for a global self-assessment;
  • The MAAS-MI Global Expert-Rating Scale of medical interviewing skills for a global assessment by experts;
  • The Obtained Information MAAS-MI G and MAAS-MI MH have been extended by checklists which register the patient’s responses to the physician’s questions or topics raised on the patient’s initiative.

We start, however with theoretical considerations of self- evaluation and global rating scales.

Theoretical considerations of self-evaluation and global rating scales 

The capacity for self-evaluation has long been considered as a hallmark of professionals, its main objective being changed behavior of the self-raters (Arnold, 1982). In our case, an improvement in medical interviewing skills may be expected. These expectations are based on data from literature on behavioral modification (Mahoney et al., 1973; Kazdin, 1974). 

Self-evaluation

Self-evaluation is considered as measuring non-cognitive abilities, such as interviewing skills, attitudes, rapport with the patient, rather than cognitive aspects such as medical knowledge (Arnold et al., 1985).

  • Validity research reveals that self-evaluations of medical students and residents are generally lower than the ratings which they receive from faculty and peers (Morton et al., 1977; Stuart et al., 1980), but controversial findings are also reported. Sclabassi et al. (1984) report, for example, that medical students over-estimated as well as under-estimated their own knowledge and skills after an anaesthesia clerkship.
  • Reliability of self-evaluation methods seems to be acceptable according to a test-retest method (Linn et al., 1975).
  • Variance  In addition, self-evaluation ratings show greater variations within competence dimensions than faculty ratings (Stuart et al., 1980).
  • Therefore, in factoring self-evaluation ratings, a picture, which is differentiated in more competence aspects, arises in comparison with faculty ratings (Kolm et al., 1985).
  • Finally, students’ self-evaluations correlated with concurrent grades, faculty assessments of the students and peer ratings at a modest yet significant level (Morton et al., 1977). 

It is suggested that the accuracy of self-evaluation can be enhanced by practice and experience with self-evaluation and by the use of unambiguous (behavioral) criteria for assessment (Stuart et al., 1980). 

Global Rating Scales 

The second topic of this section are the global rating scales, which are widely applied to the measurement of medical competency, because of some major advantages. Their ability to tap soft areas, their unobtrusiveness, their low cost of development and application, as well as their potential for feedback are frequently mentioned (a.o. Streiner, 1985).

Reliability and validity studies of these scales, though relatively scarce, also show disadvantages of this method:

  • Validity of global rating scales is generally hampered by the lack of reliability which always sets its upper limit to validity. However, concurrent and predictive validity compared with other measures of clinical proficiency is generally disappointing (a.o. Donnelly et al., 1978; Streiner, 1985).
  • Inter-rater reliability is often low, because raters observe different behavior or experiences, rate on different criteria and have difficulties in agreeing on the meaning of particular numerical scores (Levine et al., 1975).
  • In addition, most global rating scales consist of multiple scales, each measuring a separate dimension, but each having a low reliability (Dielman et al., 1980).
  • Finally, global rating scales suffer from halo- and leniency-effects which hamper a multi-faceted differentiated judgment of subjects. It therefore seems unlikely that raters can accurately assess more than two dimensions of performance (Streiner, 1985).

In his review article, Streiner (1985) makes several recommendations for the improvement of the reliability and validity of global rating scales.

  • Rating scales should be provided with behavioral anchors: points on the continuum of the scale ought to be linked to behavioral criteria.
  • The fineness of the scale should be chosen on the basis of the rater’s ability to discriminate between levels of performance.
  • The raters should be trained in the use of the scale, the definitions of the terms and different points of the continuum on the scale.
  • Ebel (1951) recommends, on statistical grounds, an average rating of more raters in order to improve reliability.

We take these recommendations into account in the construction of the Global Self-Rating Scale and the Global Expert-Rating Scale.

All instruments are in TOOLS Cont’d

MAAS-MI Variants

More

MAAS-MI G: Self-Evaluation

This method has the same format as the MAAS-MI G. Its items are re-edited in an I-format. For instance, the item Asks the patient what attempts he has made to solve the problem has been re-edited into I asked the patient what attempts he has made to solve the problem. These methods are used in Convergent and Divergent Validity studies. 

MAAS-MI G: Global Self-Rating Scale

The items of these rating scales correspond to the scales of the MAAS-MI G. They have the following format, e.g. I adequately performed the Exploration of the Reasons for Encounter, e.g. the complaints and their meaning for the patient have been elucidated. This method is used in the Convergent and Divergent Validity studies with the MAAS-MI General and Mental Health. They are also expanded on in the theoretical content of the restricted number (7-8) items of these global rating scales.  

MAAS-MI G: Global Expert-Rating Scale

The items of this rating scale, also known as MAAS-Global, correspond to the scales of the MAAS-MI G. The scale has the following format, e.g. The physician adequately performed the Exploration of the Reasons for Encounter, e.g. the complaints and their meaning for the patient have been elucidated. The method is used in the convergent and divergent validity study with the MAAS described in Convergent and Divergent Validity studies. The scale is also expanded on in the theoretical content because of the restricted number (7-8) items of the global rating scales. 

MAAS-MH: Self-Evaluation

This method has the same format as the MAAS-MI G. Its items are re-edited in an I-format. For instance, the item Asks the patient what attempts he has made to solve the problem has been re-edited into I asked the patient what attempts he has made to solve the problem. The method is used in Convergent and Divergent Validity studies with MAAS-MI Mental Health. 

MAAS-MH: Global Self-Rating Scale

The items of this rating scale correspond to the scales of the MAAS-MI G. They have the following format, e.g. I adequately performed the Exploration of the Reasons for Encounter, e.g. the complaints and their meaning for the patient have been elucidated. The method is  used in the convergent and divergent validity study with the MAAS-MI described in Convergent and Divergent Validity studies. The method is also expanded on in the theoretical content of the restricted number (7-8) items of these global rating scales.  

MAAS-MH: Global Expert-Rating Scale

The items of this rating scale correspond to the scales of the MAAS-MH. They have the following format, e.g. The physician adequately performed the Exploration of the Reasons for Encounter, e.g. the complaints and their meaning for the patient have been elucidated. The method is used in the convergent and divergent validity study with the MAAS-MI described in Convergent and Divergent Validity studies with MAAS-MI Mental Health. It is also expanded on in the theoretical content of the restricted number (7-8) items of these global rating scales. 

MAAS-MI: Obtained Information-checklists in MAAS-MI G and MAAS-MI MH

Checklists, reflecting the content of the patient’s utterances during initial medical interviews, have been constructed. In these checklists, items are constructed that reflect the content of the questions stated in the original items of the MAAS-MI G and the MAAS-MI MH.

As an illustrative example we take from the scale Exploration of the Reasons for Encounter the item Asks the patient what attempts he has made to solve the problem. This item corresponds in the checklist Obtained Information with an item Own attempts to solve the problem. These checklists Obtained Information have been constructed parallel to the scales Exploring Reasons for Encounter, History-taking, Socio-emotional Exploration

For scoring of these items, a 2-point scale is used (present/absent). The criteria for scoring are dependent on the research or evaluation situation in which the MAAS is used. When used in field settings, these items should be rather open, specifying the content in general terms. In this thesis, however, the MAAS is used with simulated patients whose roles are programmed in detail. This circumstance allows for a more circumscript definition of the content aspect of the items.

Referring to the above described example this item of the patient’s side may be formulated – according to the pertinent role of the simulated patient : I took my sleeping pills and drank a glass of hot milk (in a case of sleep-disturbance). In this example, both points, sleeping pills and a glass of hot milk should be mentioned by the patient in order to be scored ‘present’. When the conjunction or is used between the two objects, then the stating of one of the two objects is sufficient to score “present”. 

The purpose of these checklists Obtained Information is to measure:

  • Directly: the quantity and quality of the information obtained by the physician;
  • Indirectly: it may yield information about initiatives taken by the patient, talkativeness of the patient, etc.

These checklists are described and used in a study of Construct Validity MAAS-MH.

Use of the MAAS

The MAAS-MI G and MAAS-MI MH can be used in educational evaluation and theoretically orientated research into the properties of medical interviews and medical interviewing skills. We end this section with a description of the training of observers, a necessary condition for use of the MAAS. 

For evaluation in educational settings 

Formative evaluation

The MAAS is used in formative evaluation in order to give an assessment to the students of their interviewing skills and to recommend improvements where needed. These evaluations are made by peers, experts or by the student himself (and do not lead to decisions concerning study progress). In the last instance, the MAAS-MI Self may be used. Students can use the MAAS serving as a taxonomy of the skills which will be trained during the medical curriculum. 

Summative evaluation

The MAAS is also used in summative evaluations. These examinations have consequences for the students’ progress through the medical curriculum. In such instances, the MAAS with its extensions Obtained Information checklist can be used as a medical interviewing test.

For test purposes, the MAAS-G can also be used in abbreviated versions:

  • First, a selection of items fitting the Rasch model (see Scalability MAAS-MI G and Scalability MAAS-MI MH) can be taken.
  • Second, items may be randomly selected from the whole item domain of the MAAS-MI G. That selections are previously unknown to the student is vital, otherwise the student could arrange his interviews in such a manner, that they would display behavior leading to artificially increasing his scores on the MAAS, at the cost of validity. 

As no internal criterion of an insufficient or sufficient interview is available, we had recourse to group-referenced measurement, taking the student’s own peer groep as reference (Wijnen, 1971). The critical threshold is the mean of the summed scores of the group’s members minus one standard deviation. Scores below this threshold are considered as insufficient. 

For research purposes 

In addition to reliability and validity research with simulated patients, as extensively treated in this site, the MAAS is also used in field settings, for example in general practice. The mode of use of such naturalistic studies is similar to that with simulated patients. Such studies are, however, beyond the scope of this thesis. 

Training of observers 

To obtain reliable scores it is necessary for observers to undergo initial, and later a refresher training. 

The initial training takes two 3-hour sessions and encompasses the following activities:

  • Preparatory reading of the manual for observers which consists of a short introduction of the MAAS (construction, theoretical background, measurement rationale), a list of definitions of interviewing skills and a list of criteria for scoring per item;
  • Discussion of this documentation during the first session;
  • Use of the MAAS in scoring a videotaped interview, whilst this videotape is played for periods of 1 or 2 minutes. After every period, possible scores are discussed;
  • During the second 3-hour session, one videotaped interview is scored in periods of 5 minutes and discussed as to the possible scores;
  • Finally, videotaped interviews are viewed in their entity and scored afterwards. Results are compared and discussed after scoring.

Refresher training is needed when the observer has not used the MAAS for a period of two months:

  • A quick re-reading of the list of definitions and of the criteria for scoring of each item as well as the scoring of one test videotape with discussion afterwards is necessary;
  • The required time does not exceed one hour’s preparation.

The scoring for the MAAS-MI G and MAAS-MI MH does not take more than 20 minutes of a trained observer. About half of the items can be scored by an observer, during the interview, and the second half after the interview. Thus the entire time spent by the observer in scoring consists of the duration of the interview extended by approximately 10 minutes.

Table 3 -- Roter’s Categories for Interactional Process Analyses
Schermafbeelding 2021-02-25 om 17.29.57
Table 2 -- Twenty Two Methods of Measuring Medical Interviewing Skills Reviewed According to Pre-set Criteria
Schermafbeelding 2021-02-26 om 13.52.04
Table 2 Cont'd -- Twenty Two Methods of Measuring Medical Interviewing Skills Reviewed According to Pre-set Criteria
Schermafbeelding 2021-02-26 om 14.00.19
Table 2 Cont'd -- Twenty Two Methods of Measuring Medical Interviewing Skills Reviewed According to Pre-set Criteria
Schermafbeelding 2021-02-26 om 14.02.50
Table 2 Cont'd -- Twenty Two Methods of Measuring Medical Interviewing Skills Reviewed According to Pre-set Criteria
Schermafbeelding 2021-02-26 om 13.36.24
Table 2 Cont'd -- Twenty Two Methods of Measuring Medical Interviewing Skills Reviewed According to Pre-set Criteria
Schermafbeelding 2021-02-26 om 13.55.07

References 

Selected Reading

Arnold L. Self-evaluation in undergraduate and graduate medical education. Proceedings of the 20th Annual Conference on Research in Medical Education. Washington DC, 1981. 

Stiles WB, Putnam SM. Classification of medical interview coding systems. Paper presented at the International Conference on Doctor- patient Communication. 16-18 Sept. 1986, London, Ontario. 

Streiner DL. Global rating scales. In: Neufeld VR, Norman GR (Eds). Assessing clinical competence. Springer Publ. Cie., New York, 1985. 

Swanson DB, Mayewski RJ, Norsen L, Baran G, MUshlin AI. A psychometric study of measures of medical interviewing skills. Proceedings of the 20th Annual Conference on Research in Medical Education, 1981: 308. 

Thorndike RL. Applied psychometrics. Houghton Mifflin Cie., Boston, 1982. 

All References

More

Adler LM, Enelow AJ. An instrument to measure skill in diagnostic interviewing: a teaching and evaluation tool. Journal of Medical Education, 1966; 41: 281-288. 

Arnold L. Self-evaluation in undergraduate and graduate medical education. Proceedings of the 20th Annual Conference on Research in Medical Education. Washington DC, 1981. 

Arnold L, Willoughby TL, Calkins EV. Self-evaluation in undergraduate medical education: a longitudinal perspective. Journal of Medical Education, 1985; 60: 21-28. 

Bales RF. Interaction Process Analysis. Addison Wesley, Cambridge, 1950. 

Barbee RA, Feldman S, Chosy LW. Quantitative evaluation of student performance in the medical interview. Journal of Medical Education, 1967; 42: 238-243. 

Barsky AJ, Kazis LE, Freiden RB, Goroll AH, Haten CJ, Lawrence RS. Evaluation of the interview in primary care medicine. Social Science in Medicine, 1980; 14A: 653-658. 

Beekers M. Interpersoonlijke vaardigheidstherapieën voor kansarmen. Diss. Swets en Zeitlinger, Lisse, 1982. 

Brockway BS. Evaluation of the physicians competency: what difference does it make? Evaluation and Program Planning, 1978; 1: 211- 220. Cone JD, Hawkins RP (Eds.). Behavioral Assessment. Brunner and Mazel, New York, 1977. 

Dielman TE, Huil AL, Davis WK. Psychometric properties of clinical performance rating scales. Evaluation and the Health Professions, 1980; 3: 103-117. 

Donnelly MB, Gallagher RE. A study of the predictive validity of patient management problems, multiple choice tests and rating scales. Proceedings, 17th Annual Conference on Research in Medical Education, Washington DC, 1978. 

Dorp C van. Luisteren naar patiënten; een analyse van het medisch interview. De Tijdstroom, Lochem, 1977. 

Ebel RL. Estimation of the reliability of ratings. Psychometrika, 1951; 16: 407-424. 

Freemon B, Negrete VR, Davis M, Korsch BM. Gaps in doctor-patient communication: doctor-patient interaction analysis. Pediatric Research, 1971; 5: 298-311. 

Goldberg D, Steele JL, Smith C, Spivey L. Training family practice residents to recognize psychiatric disturbances. Final Report. Dept. of Psychiatry, Biametrics and Family Practice, Medical University of South Caroline, 1980. 

Hess JW. Methods for evaluating medical students skills in relating to patients. Journal of Medical Education, 1969; 44: 934-938. 

Hill CE. Counselor verbal response category system. Journal of Counseling Psychology, 1978; 25: 461-468. 

Hoed FE den, Sluys EM. Het meten van “Methodisch Werken”. NHI Utrecht, 1982. 

Hollifield G, Rousell CT, Bachrach AJ, Pattishall EG. A method of evaluating student-patient interviews. Journal of Medical Education, 1957; 32: 853-857. 

Inui TS, Carter WB, Kukull WA, Haigh VH. Outcome-based doctor- patient interaction analysis. I. Comparison of techniques. Medical Care, 1982; 22: 537-549. 

Jarrett FJ, Waldron JJ, Borra P, Handforth JR. The Queen’s University Interviewer Rating Scale (QUIRS). Canadian Psychiatric Association Journal, 1972; 17: 183-188. 

Kazdin AE. Reactive self-monitoring: the effects of response desirability, goal setting and feedback. Journal of Consulting and Clinical Psychology, 1974; 42: 704-716. 

Kolen P, Verhuist J. Comparing self and supervisor evaluation. A different view. Proceedings of the 25th Annual Conference on Research in Medical Education, Washington DC, 1985. 

Lazare A, Eisenthal s, Wasserman L. The customer approach to patienthood. Attending to patients requests in a walk-in clinic. Archives of General Psychiatry, 1975; 32: 553-558. 

Levine HG, GUstavson LP, EMergy JLR. The effectiveness of various assessment clerkship. Proceedings, 14th Annual Conference on Research in Medical Education, Washington DC, 1975. 

Linn BS, Arostegui M, Zeppa R. Performances rating scale for peer and self assessment. British Journal of Medical Education, 1975; 9: 98- 101. MacDonald M, Templeton B. Interpersonal) skills assessment technique (ISEE-81). National Board of Medical Examiners. Philadelphia, 1981. 

Mahoney Mj, Moura H( i, Wade TC. Relative efficacy of self-reward, self-punishment and self-monitoring techniques for weight loss. Journal of Consulting and Clinical Psychology, 1973; 40: 404-407. 

Mischel W. Personality and assessment. Wiley, New York, 1968. 

Mokkink H, Smits A, Grol A. Prevara: een observatie-instrument voor het handelen van de huisarts in the kader van processen van somatische fixatie. Nederlands Tijdschrift voor Psychologie, 1982; 37: 35-50. 

Morton JB, MacBeth WAAG. Correlations between staff, peer and self assessments of fourth-year students in surgery. Medical Education, 1977; 11: 167-170. 

Mumford E, Anderson D, CUerdon T, Scully D. Performance-based evaluation of medical students’ interviewing skills Journal of Medical Education, 1984; 59: 133-135. 

Pieters HM, Jacobs HM. Hulpverlening van huisartsen in opleiding getoetst; een gedetailleerde consult observatie. Medisch Contact, 1983; 38: 1539-1542. 

Roter DL. Patient participation in the patient-provider interaction: the effects of patient question asking on the quality of interaction, satisfaction and compliance. Health Education Monographics, 1977; 5: 281-315. 

Rutter M, Cox A. Psychiatric interviewing techniques: I. Methods and measures. British Journal of Psychiatry, 1981; 138: 273-282. 

Sclabassi SE. Development of self-assessment skills in medical students. Medical Education, 1984; 18: 226-231. 

Siegler M, Osmond H. Models of madness, model of medicine. Harper and Row, New York, 1976. 

Sprunger LW. Analysis of physician-parent communication in pediatric prenatal interviews. Clinic Pediatrics, 1983; 22: 553-558. 

Stiles WB. Verbal response modes and dimensions of interpersonal roles: a method of discourse analysis. Journal of Personality and Social Psychology, 1978; 36: 693-703. 

Stiles WB, Putnam SM. Classification of medical interview coding systems. Paper presented at the International Conference on Doctor- patient Communication. 16-18 Sept. 1986, London, Ontario. 

Stillman PL. Arizona Clinical Interview Rating Scale. Medical Teacher, 1980; 2: 248-251. 

Streiner DL. Global rating scales. In: Neufeld VR, Norman GR (Eds). Assessing clinical competence. Springer Publ. Cie., New York, 1985. 

Stuart MR. Goldstein HS, Snope IC. Self-evaluation by resident on family medicine. Journal of Family Practice, 1980; 10: 639-642. 

Swanson DB, Mayewski RJ, Norsen L, Baran G, MUshlin AI. A psychometric study of measures of medical interviewing skills. Proceedings of the 20th Annual Conference on Research in Medical Education, 1981: 308. 

Thorndike RL. Applied psychometrics. Houghton Mifflin Cie., Boston, 1982. 

Wijnen WHF. Onder of boven de maat. Een methode voor het bepalen van de grens voldoende/onvoldoende bij studietoetsen. Diss. Swets en Zeitlinger, 1971.