Face validity

It would not be a surprise if the majority of dissertations at the undergraduate and master's level rely heavily on face validity (also known as logical validity), typically because it is the easiest form of validity to apply. Unfortunately, face validity is arguably the weakest form of validity and many would suggest that it is not a form of validity in the strictest sense of the word. In this article, we (a) explain what face validity is, (b) present examples of face validity, and (c) discuss the advantages and disadvantages of using face validity in dissertations.

What is face validity?

Face validity could easily be called surface validity or appearance validity since it is merely a subjective, superficial assessment of whether the measurement procedure you use in a study appears to be a valid measure of a given variable or construct (e.g., racial prejudice, balance, anxiety, running speed, emotional intelligence, etc.). To provide some explanation:

When we examine a variable (or construct) in a study, we choose one of a number of possible ways to measure that variable (or construct). For example, we may choose to use questionnaire items, interview questions, and so forth. These questionnaire items or interview questions are part of the measurement procedure. This measurement procedure should provide an accurate representation of the variable (or construct) it is measuring if it is to be considered valid. For example, if we want to measure intelligence, we need to have a measurement procedure that accurately measures a person's intelligence. Since there are many ways of thinking about intelligence (e.g., IQ, emotional intelligence, etc.), this can make it difficult to come up with a measurement procedure that has strong validity. However, there are ways to assess how valid a measure is; for example, by assessing its construct validity or content validity [see the articles: Construct validity and Content validity]. Another option is to assess the measure's face validity.

Face validity is only considered to be a superficial measure of validity, unlike construct validity and content validity because is not really about what the measurement procedure actually measures, but what it appears to measure. This appearance is only superficial. One of the main reasons that researchers are interested in face validity is a belief that a measure should appear to measure what it measures. In other words, if a research participant thinks that they are completing a questionnaire to identify the best football players in the league, the following question may have strong face validity: How many times have you played in the 1st squad/team this year? After all, it appears to make sense that the best players would have played the most games in the top team for a club. This is what we mean by face validity. However, in reality, the number of times played in the top team/squad may not be a good measure of the best football players. Some of the top players are rested more often by their clubs. Players may be on the field a lot, but score few goals, or make fewer assists. The coach may want to give younger players more experience by picking them for less challenging games. The coach may have a system of playing that favour certain types of players (e.g., those with good crossing ability), which encourages the coach to pick players that, player-for-player, are inferior.

Whilst face validity is often used as the main form of validity for assessing measurement procedures in undergraduate and master's level dissertations, this is not always the case. When you think about the use of face validity in your research, it is important to consider it in context, either as the main form of validity for your measurement procedure, or as a supplemental form of validity to other types of validity (e.g., construct validity and content validity).

Examples of face validity

Before we discuss some of the advantages and disadvantages of using face validity in your research, we have provided a few more examples to highlight how face validity can be used, and some of the associated problems.

Variable (or construct) to be measured
Racial prejudice

Face valid measures
A very obvious, direct/explicit questionnaire item:
I think that African Americans are inferior to Whites [or vice versa]. (Yes/No or Likert scale)
A less obvious, but still direct/explicit questionnaire item:
"How would you react if a family member wanted to bring a Black friend to dinner?" (Wittenbeck et al., 1997, p. 262)

More valid measures
Relating implicit measures of stereotyping and prejudice to explicit measures to identify "true" stereotyping and prejudice; other implicit measures such as the Implicit Association Test (IAT) (Wittenbrink et al., 1997; Quillian, 2006)

Clearly, it is highly unlikely that even an individual that is racially prejudice is likely to agree with a statement such as: "I think that African Americans are inferior to Whites" [or vice versa], especially in a face-to-face interview/questionnaire situation. Rather, methods that have limited face validity may be more appropriate. Research shows that when common questionnaire measures are used to assess racial attitudes, which more directly (i.e., more explicitly) measure racial prejudice, people answer in a way that tries to reduce the appearance that they are radically prejudice. This can be alleviated by asking people more indirect (i.e., more implicit) questions about their racial attitudes (Wittenbrink et al., 1997). It could also be argued that (a) these more indirect/implicit questionnaire items have weaker face validity, and (b) the strong face validity amongst the direct/explicit questionnaire items reduce the likelihood of people giving the answer that they felt was socially acceptable (or what they felt the researcher would want to hear).

Variable (or construct) to be measured

Face valid measures
The time a person can balance on one foot with (or without) their eyes closed (Bohannon et al., 1984)

More valid measures
Romberg tests that use forceplates to examine the sway of a person when standing or balancing on one foot (Bohannon et al., 1984)
Star Excursion Balance Tests (SEBTs) that are more sensitive to detecting motor control deficits in individuals, and which are more demanding than simple balancing tests (Olmsted et al., 2002)

At first sight, it seems logical to assess a person's balance by examining how long (in time) they can balance on one foot (Bohannon et al., 1984). This measure of balance has strong face validity, and was viewed as a valid means of measuring balance for some time. Over time, other more sensitive tests were created that examined some of the intricacies of balance. Some of these, such as the Romberg tests, involve people standing on one or two feet (Bohannon et al., 1984). However, others criticised such basic standing techniques for failing to detect more sensitive motor control deficits in individuals. As such, techniques such as the Star Excursion Balance Tests (SEBTs) were created, which involved people reaching in different directions whilst standing on a single foot (Olmstead et al., 2002). Whilst these techniques still had strong face validity, they also had greater construct validity.

Variable (or construct) to be measured

Face valid measures
"My stomach gets upset when I think about taking tests"
"My heart starts pounding fast whenever I think about all of the things I need to get done"
(from Kaplan & Saccuzzo, 2008, p. 136)

More valid measures
"Feeling of choking"
"Fear of losing control"
From the Beck Anxiety Inventory (BAI; Beck & Steer, 1990)
"I tire nervous and restless"
"I wish I could be as happy as others seem"
From the State-Trait Anxiety Inventory (STAI; Spielberger, 1985)

There are many questions or statements that could be used to measure anxiety. It could be argued that the two examples above show strong face validity (e.g., "My stomach gets upset when I think about taking tests" [Kapland & Saccuzzo, 2008, p. 136]). However, these statements do not have strong construct validity or content validity. Anxiety is actually a complex concept. For example, the items, "I tire nervous and restless" and "I wish I could be as happy as others seem" both come from the State-Trait Anxiety Inventory (Spielberger, 1985), which uses different items to distinguish between anxiety when it acts as a trait variable as opposed to a state variable (Kabacoff, 1997). The example items from the Beck Anxiety Inventory - "feeling of chocking" and "fear of losing control" (Beck & Steer, 1990) - reflect how there can be an overlap between anxiety and another concept, depression, which should be taken into account when trying to measure anxiety (Kabacoff, 1997). The purpose of these more complex measures, unlike the first two examples that we gave that had strong face validity, is that they reflect the concept of anxiety more reliably; they have much stronger content validity and construct validity.

Variable (or construct) to be measured
Emotional intelligence

Face valid measures
I am good at judging others
I am in control of my emotions

More valid measures
The use of questionnaire items relating to the emotional competences of self-awareness, accurate self-assessment, and self-confidence, which make up one of four emotionally intelligent domains; in this case, self-awareness (i.e. the other three domains are social awareness, self-management, social skills/relationship management, each which have their own emotional competences) (Boyatzis et al., 1999; Goleman et al., 2002). Other valid measures of emotional intelligence have also been suggested (e.g. Mayer & Geher, 1996; Mayer et al., 2000).

It's not too difficult to imagine measures that could be used for emotional intelligence (e.g., being good at judging others, being control of our emotions, etc.). But these only demonstrate face validity. Just like anxiety, emotional intelligence is a complex concept. As the example above illustrates, measurement procedures used to capture emotional intelligence aim to measure a wide range of emotionally intelligent domains and emotional competences. Not all of these measures may necessarily appear face valid, but they do demonstrate stronger construct and content validity (e.g., Boyatzis et al., 1999; Mayer & Geher, 1996; Mayer et al., 2000; Goleman et al., 2002).

1 2