Recently, Van Berkel (1984) drew up an inventory of 77 distinct types of validity and classified them into four major categories (Cronbach et al, 1955; Cronbach, 1970; de Groot, 1972; Dook et al, 1979; Thorndike, 1982).
These four categories of validity are:
- Criterion-orientated validity, which correlates results of a test with a criterion outside the test situation;
- Content validity, which refers to how adequately the content of the test represents the universe that the test intends to measure;
- Construct validity, which analyses the meaning of test scores in terms of psychological constructs;
- Experimental validity, which studies the generalizability of conclusions derived from experiments to situations outside the experimental setting.
Inferences from the first three validity types are based on what a subject achieves on the pertinent and related tests, whereas inferences from the last validity type are based on a critical appraisal of the design of the test setting.
Philipsen (1984) approached the issue of validity slightly differently by differentiating between two dimensions in validity studies.
- Firstly, he recognized the goals researchers try to achieve ranging in hierarchical order from face-validity, content validity to construct validity.
- Secondly, he differentiated between the procedures which can be applied as predictive validity, discriminant validity or concurrent validity.
By combining both dimensions, nine types of validity are discerned. Philipsen’s contribution emphasizes the depth of analysis with regard to each procedure. As textbooks are organized around the four major types of validity mentioned by van Berkel, they are elaborated on in the following paragraphs, but we keep in mind that the depth of analysis can vary for each procedure.
In criterion-orientated validity, the question is studied of how well test scores are able to predict criterion performance. Criterion-orientated validity is sometimes called concurrent validity when no time has elapsed between the measurements, or predictive validity when a criterion is predicted for the future.
Criterion-orientated validity is primarily applied:
- To tests which are used to select or classify subjects, such as students, patients or employees;
- To tests which are used to decide what treatment should be given to subjects;
- And to tests which are used as a substitute for a more cumbersome assessment procedure (Cronbach, 1970).
Criterion-orientated validity is operationalized by the correlation between test performance and future criterion performance:
- High or modest correlations confirm the criterion-orientated validity of a test.
- Criterion-orientated validity only provides firm evidence of validity when the measurements are intended as predictors in a specific research setting with a specified criterion which is measured validly in itself (de Groot, 1972).
The construction of criterion measurements forms the greatest problem for predictive validity since it is difficult, but vital, to obtain suitable, valid measurements of the criterion. Difficulties may arise when the criterion behavior is multi-dimensional, vague or equivocal. Theoretical considerations of the relationship between predictor and criterion are essentially unimportant in the determination of criterion-orientated validity.
MAAS-MI
Although we acknowledge the importance of criterion-orientated validity, especially for selection or treatment purposes, we do not apply it to the MAAS. An adequate research design for the determination of the criterion-orientated validity or, more precisely, the predictive validity, of the MAAS, would be to record students’/future physicians’ medical interviewing skills by means of the MAAS at this moment and then correlate them with measurements of their interviewing skills recorded after the physicians have had several years of experience in daily practice (Crijnen et al, 1984). The strength of the correlations would indicate MAAS’s criterion-orientated validity. Since the time-lag necessary to obtain the future criterion measurements is beyond the scope of our project, we have been unable to study the predictive validity of the MAAS. Determination of the MAAS’s criterion-orientated validity should certainly be carried out in the future as the MAAS is already used to classify and select medical students.
The establishment of criterion-orientated validity in terms of concurrent validity is elaborated in the sections on construct validity since theoretical considerations are strongly taken into account.
Content validity is studied when researchers evaluate whether the items of the test adequately represent the universe the test intends to measure. Determination of content validity is especially required when tests are designed to measure the degree of mastery of some domain of knowledge or skill.
Content validity was enhanced during construction of the Maastricht History-taking and Advice Checklist through:
- Participation of a large group of physicians and psychologists who continually scrutinized the content of the MAAS-MI;
- Attempts to define as clearly as possible:
- The context in which we are interested (initial medical consultations);
- The behavior we have tried to measure (six different categories of medical interview behavior);
- The task given to the subjects (perform a medical interview);
- Explicit formulation of the theoretical considerations of medical interviewing skills and empirical evidence which were used to construct the scales.
Content validity is determined by judgment of experts on how adequately the domain of interest is represented in the test:
- A prerequisite for the assessment of content validity is a clear, detailed and explicit definition of the universe a researcher wishes to measure (see also Building Blocks for MAAS-MI). This definition ought to cover the kind of tasks and situations covered by the universe; the kinds of responses the observer wishes to count; and the instruction to the subjects (Cronbach, 1970);
- Unfortunately, objective determination of a test’s content validity is difficult since few attempts are made to develop quantitative indices of content validity.
MAAS-MI
Since content validity was secured during test construction, we made one additional systematic effort to objectify content validity of the MAAS (see also Content Validity MAAS-Mental Health).
In 1955, Cronbach and Meehl recommended that the construct validity of new tests should be established in addition to the criterion-orientated validation procedure which was used at that time but was severely criticized.
They defined construct validity as the analysis of the meaning of test scores in terms of psychological concepts or constructs. In interpreting test scores, researchers have to face the question: What constructs account for variance in test performance? Constructs were seen as ‘some postulated attributes of people assumed to be reflected in test performance’ (Cronbach et al, 1955). The concept of constructs was developed to describe or account for certain recurring characteristics of a subject’s behavior (Thorndike, 1982).
What constructs account for variance in test scores?
Theories About a Construct Although constructs cannot be assessed directly, researchers have developed a theory of the construct to a certain level of sophistication. They know how a construct will express itself, what sub-groups in the population possess a high or low degree, what conditions favor or inhibit expression of the construct, what test-tasks elicit the construct, etc. The theoretical considerations form an essential part of construct validity, since they suggest kinds of evidence that are relevant for assessing how well a measurement depends upon the construct.
Cronbach (1970) described a general outline for establishing construct validity that was based on these theoretical considerations:
- First of all, researchers have to suggest what constructs might account for test performance;
- Secondly, testable hypotheses are derived from the theory surrounding the construct;
- Finally, researchers carry out studies to test the hypotheses empirically.
Theoretical reflections on the behavior of the construct under study underlie all procedures for the investigation of construct validity. Over the course of time, several procedures have been elaborated to evidence a measure’s construct validity. We confine ourself here to the four procedures described by Thorndike (1982) which rely heavily on the original work of Cronbach and Meehl (1955).
1. Comparison of test tasks with conception of the attribute
The first question to ask about a method of measurement is: Do the items and the test task appear to call for the construct in question? Is the content reasonable for eliciting the construct we wish to measure? Congruence between the assumed construct and item content forms a first indicator of the essential nature of our method of measurement, but is in no way conclusive. Unfortunately, no precise methods are available for properly outlining the item or variable domain of a construct (Nunnally, 1967). This matter is left entirely to the researcher’s understanding of the construct. This procedure comes close to establishing content validity (de Groot, 1972).
2. Correlational evidence of construct validity
This procedure comes closest to the meaning of construct validity. It states very generally that:
- Measurements should show substantial high correlations with different measurements of the same construct, as well as with measurements of theoretically related constructs;
- Whereas low correlations with measurements of other, theoretically unrelated attributes are expected.
This type of construct validity is elaborated in two distinct directions:
- Firstly, the assessment of the nomological network;
- Secondly, the assessment of convergent and divergent validity (Cronbach et al, 1955; Campbell et al, 1959; Thorndike, 1982).
Nomological Network Construct validity of a test is underscored when the relations in the nomological network, defined as the interlocking system of laws which constitute a theory, are supported by empirical evidence.
The nomological network describes the systems of laws which constitute a theory
The nomological network of a theory contains a theoretical model, related hypotheses and predictions which include empirical references, and empirical evidence stemming from previous validity studies (de Groot, 1972). Based on the nomological network, researchers develop hypotheses about the nature and strength of relations between the constructs under study and other constructs. They make judgments about the nature of certain activities and the skills required to perform them successfully.
In construct validity, these judgments are tested. When the predicted relations appear empirically, the construct validity of the measurements of the construct is supported. The relations predicted by the nomological network should be able to explain the strength of the correlations. An additional advantage of this procedure is that the researcher’s understanding of the coherence in daily-life is increased.
A fortunate side-effect of studying construct-validity is that we understand the coherence of interviewing skills with their effects on patient and physician better
When the relations fail to appear, the nomological network and/or the construct validity of the measurements of the construct are questioned. The uncertainty about the interpretation of negative results forced Nunnally (1967) to discredit the idea that sufficient evidence for construct validity is brought forward when the supposed measurements of a construct behave as expected. He stated that all that can be tested is the correlation between measurements of constructs, whereas researchers came to conclusions about both the theory which surrounds the test and the construct validity of the measurements. Studies of construct validity are only safe, according to Nunnally, when firstly a supposed measurement of a construct is related to a particular observable variable of which the domain is well defined and, secondly, when the assumption of the relationship between the two constructs is unarguable. Moreover, Nunnally warned researchers from assuming that constructs have objective reality. He proposed that a construct’s name could act as a useful way of labelling a particular set of observable variables. Validity would then be indicated by the extent to which the name accurately communicates the kind of observables that are being studied.
The relation between a physician’s interview behavior as measured by MAAS-MI G and MAAS-MI MH and several outcomes of the interview such as patient-satisfaction and the quality of diagnosis and treatment plan are studied here to determine the nomological net of both instruments. Studies are carried out in a simulated consultation hour and in consultation hours in which general practitioners interview real patients. See also: Evidence Base – MAAS-MI G and Evidence Base – MAAS-MI MH.
Convergent and Divergent Validity The assessment of a test’s convergent and divergent validity by means of the multitrait-multimethod matrix has been recommended by several authors as an appropriate way of assessing the identifiability of a proposed construct (Campbell et al, 1959; Fiske, 1971; Cronbach, 1972; Kerlinger, 1981; Thorndike, 1982).
Convergent validity refers to the assessment of the same construct by means of different methods, whereas divergent validity refers to the assessment of distinct constructs by means of the same and/or other methods. Campbell and Fiske (1959) approached the assessment of a test’s convergent and divergent validity systematically by applying each of several methods of measurement to each of several constructs.
They proposed the examination of the resulting matrix of correlations according to four criteria which refer to:
- Convergence of the methods with regard to a pertinent construct;
- Divergence respectively of constructs and methods;
- A general pattern of correlations among the constructs.
This classical approach to convergent and divergent validity brings evidence to bear on the quality of the representation of the construct by the content of the test and it brings to light systematic variance introduced by the method of measurement.
We study the convergent and divergent validity of MAAS-MI G and MAAS-MI MH by constructing a traditional multitrait- multimethod matrix. See also: Conv. & Div. Validity MAAS-MI G and Conv. & Div. Validity MAAS-MI MH).
Clinical Competency In addition to the classical approach, several methodologists have described a less elaborated procedure to determine convergent and divergent validity by stating that the theory of a construct should be able to explain what other variables are correlated or uncorrelated with the measurements of the construct (Kerlinger, 1981; Thorndike, 1982). This procedure fails to provide information about the influence of shared method variance, but enables researchers to describe the content of the constructs more effectively. In this way, it is closely related to the assessment of the nomological net.
We apply this procedure to establish the relations between measurements of medical interviewing skills (MAAS-MI G) and other dimensions of medical competency. See also: Interviewing Skills & Clinical Competency.
3. Group differences as evidence of construct validity
If the understanding of a construct leads researchers to expect that distinct groups of subjects will respond differently to their measurements, this hypothesis can be tested. Evidence of construct validity is obtained when the hypothesis that the groups differ on the specific issue is proved by the data (Cronbach et al, 1955; Thorndike, 1982). When researchers apply this kind of construct validity, they have to be aware that they simultaneously test their understanding of and theory about the differences between the groups and the construct validity of their measurements. Positive results affirm both; negative results may stem from a shortcoming in one or both of them.
4. Treatment effects as evidence of construct validity
Any experimentally introduced intervention or any naturally occuring change in conditions that might be expected to influence the construct under study, can be used to study construct validity (Cronbach et al, 1955; Thorndike, 1982). Construct validity is supported when scores are in the predicted direction. When two measurements are affected similarly by a variety of treatments, the suggestion is raised that they are measuring much the same trait which is a slightly different way of assessing convergent validity (Nunnally, 1967). Whether the degree of stability is encouraging or discouraging for the proposed interpretation depends upon the theory defining the construct (Cronbach et al, 1955). Furthermore, Thorndike (1982) remarked that measurements of states, as contrasted with measurements of traits, are especially sensitive to interventions. Traits are expected to be relatively insensitive to manipulations of conditions. The impact of an intervention on a pertinent construct provides useful information about the construct.
We have studied the growth of medical students’ interviewing skills during medical school. Results supported the construct validity of MAAS-MI G measurements of interviewing skills. See also: Growth in Interviewing Skills over Medical School.
5. Conclusions about construct validity
It is evident that Cronbach and Meehl’s thoughts on construct validity form a fruitful contribution to the study of validity. Construct validity is essentially based on two important notions. Firstly, researchers formulate a nomological network from which testable hypotheses may be derived on the relation between the construct under study and other constructs. Secondly, researchers confirm or refute the hypotheses based upon empirical evidence stemming from a variety of test situations. Moreover, construct validity forces researchers to be very explicit about the theory which surrounds their constructs.
The approaches to construct validity provided a useful frame of reference for the development of procedures for determining the construct validity of the MAAS-MI General and MAAS-MI Mental Health.
A more extensive description of how the theoretical notions of construct validity were used to develop research settings, procedures and additional instruments will be provided further in this chapter.
A different type of validity from those addressed in the preceding paragraphs is the fourth type here described, because it points to the justification of generalizations drawn from results of experiments as related to situations outside the experiment.
With regard to this issue, Campbell and Stanley (1966) and Cook and Campbell (1979) invoked two types of validity called internal and external validity.
- Internal validity refers to the inferences made by researchers that a relationship between two variables is causal or that the absence of a relationship implies the absence of cause. The concepts of covariation, time sequence and confounding variables are important in internal validity.
- External validity refers to the validity from which researchers infer that the presumed causal relationship can be generalized to and across different types of persons, settings and times. Matters of internal and external validity are of importance to the study of the validity of the MAAS and are discussed in the pertinent chapters.
With regard to the external validity of the MAAS-MI G and MAAS-MI MH influences of different physicians, different simulated-patients, real patients, different cases, different groups of subjects, etc., have to be taken into account in our measurements of interviewing skills. Some of these issues will be elaborated on in this thesis/site, such as case-influences or the influence of simulated-patients, whereas other influences have to be reserved for future studies. To enhance the external validity of the MAAS, the study of physicians’ interviewing skills while they are talking with real patients during their daily practice is important, but was only partially carried out.