External validity | Lærd Dissertation

Constructs, methods, confounding, and external validity

Whether we can make generalisations from our sample to the wider population, or across treatments, will also depend on how we operationalized our study; that is, how we defined the constructs and variables we wanted to measure, what treatments (i.e., interventions) we made, and so forth. In this section, we focus on threats to external validity that can arise from using (a) single measures and (b) single methods of constructs. Each of these is discussed in turn:

Generalisation and constructs

In quantitative research designs, we have to narrow down broad concepts we may be interested in into constructs that can be measured [see the article: Constructs in quantitative research].

For example, we may be interested in a broad concept such as intelligence. However, the concept, intelligence, can be viewed from a wide range of perspectives. If we were to try and examine such a broad concept, we would not be confident that our study was construct valid, which would lead to considerable criticism [see the article: Construct validity]. Therefore, rather than tackling broad concepts such as intelligence, we focus on more specific constructs. In terms of the concept, intelligence, we may choose to look at a common construct such as IQ. However, there are others ways to perceive and measure intelligence, including EQ (or emotional intelligence).

Since experiments cannot take into account all of the different ways that a concept can be perceived and measured, it will only be possible to make generalisations from your results within the confines of the operational definition you provided for your concept. By operational definition, we mean the construct you decided to study from some broader concept (i.e., how you perceived a concept), and the specific way you decided to measure that construct (i.e., the variables you used to represent the construct) [see the article: Constructs in quantitative research]. This points to two types of generalisation we need to be careful about when it comes to constructs: generalising across constructs and generalising across measures for a given construct. Each is discussed in turn:

Generalising across constructs
It is not always so easy to distinguish between different constructs that are part of the broader concept you are interested in [see the article: Constructs in quantitative research]. Assessing the divergent validity of a measurement procedure can help you to check that the measurement procedure does not measure multiple constructs [see the article: Criterion validity (convergent and divergent validity)]. However, ultimately this is about conducting a thorough literature review in order to understand the boundaries of the construct(s) you are interested in.
For the most part, you cannot make generalisations across constructs with confidence. For example, you cannot make generalisations about EQ (i.e., emotional intelligence) when studying IQ because they are such different constructs, even though they both fit within the broader concept of intelligence. This highlights the importance of setting an operational definition that is construct valid [see the article: Construct validity].
Generalising across measures for a given construct
Whilst there are many different ways in which a given construct can be measured, it is not uncommon in quantitative research, at least at the undergraduate and master's level, to only use a single measure for that construct. To explain what we mean by this, and how it can become a threat to external validity, let's look at the example in Study #2 below:

Study #2
The relationship between background music and task performance amongst employees at a packing facility
Note: If you have read, Extraneous and confounding variables, you will already have come across this example. If you remember it, click here to skip to the end of the example.

The study aims to examine the relationship between background music (i.e., construct A) and task performance (i.e., construct B) amongst employees at a packing facility (e.g., Amazon, Wal-Mart, Tesco, etc.). In these packing facilities, the job of employees is to collect items ordered by customers from the warehouse, package them, stick on a label with the customer's address, and put the package on the delivery line. Each time an employee does this, they complete one task.

The purpose of the study is to find out what effect background music might have on employees? task performance; that is, how many packages (i.e., tasks) they process in a given hour. This is important to firms because if they find that background music has a positive effect on task performance; that is, if background music increased the number of packages processed in a given hour, they may want to rollout a programme of background music in all of their packing facilities.

The independent variable is background music (i.e., construct A), which is a nominal variable because employees are either provided with or without background music. The dependent variable is task performance (i.e., construct B), which is a continuous variable, measured in terms of the number of tasks employees perform correctly per hour.

The independent variable, background noise, consists of a control and a treatment. The control refers to the normal conditions experienced by employees in the packing facility, which in this case, means that employees are not being provided with background music (i.e., employees without background music). The treatment is the intervention that we are making to compare the addition of background music with the normal conditions (i.e., with the control) in the packing facility. In other words, the treatment is providing the employees with background music. It is this independent variable (i.e., background music) that we are manipulating to examine its effect on the dependent variable (i.e., task performance). We use the word manipulating because we are taking the independent variable and changing it (i.e., with or without background music) for different groups (i.e., the control group and treatment group).

So in order to conduct this experiment, we take a sample of employees at the packing facility (e.g., a sample of 100 employees from the total of 400 employees that work there, which is known as the population). We then randomly assign half of these sample employees (i.e., 50 employees) to the control group and the other half (i.e., 50 employees) to the treatment group. At a given day and time, we start the experiment; so the control group continue with their normal day without any music, whilst the treatment group gets to listen to music. The experiment continues for an 8 hour shift. For each of these 8 hours, we record the number of tasks each employee performs correctly, both for the control group and the treatment group. This task performance is our dependent variable (also known as an outcome variable).

Under normal circumstances, we would then statistically analyse our results by comparing the scores on the dependent variable (i.e., the number of correctly performed tasks per hour) between the two groups (i.e., the control group and the treatment group). This should show us whether there are any differences in the number of tasks performed between the control group and treatment group. This would, in theory, tell us about the relationship between background music and task performance amongst the employees at the packing facility. When we perform the analysis on the data from the two groups, we find that (a) there is a difference between background music, (b) the difference is statistically significant, and (c) the difference equates to a 10% increase in task performance. In other words, we found that the addition of background music improved the task performance of employees by 10% compared to the control group that had no background music. Since our statistically analysis shows that the relationship between background music and task performance was statistically significant, we conclude with some confidence that background music improves task performance in the packing facility.

Now let's go back to the danger of trying to make generalisations across measures when only using a single measure for that construct. Let's just look at one construct from Study #2, construct A, which was background music, our independent variable. We measured background music in a very simple way; that is, background music was presented as a nominal variable because employees were either provided with or without background music. However, the construct, background music, is actually much more complex. Think about the following aspects of music:
- Type of background music (e.g., chart music, dance/electronic music, easy listening, classical music, etc.)
- Loudness of background music (e.g., low, medium, high volumes, etc.)
Since only one type of background music was played during the experiment (e.g., easy listening), which was played at a medium volume (i.e., loudness), we measured our independent variable, background music, using just a single measure. The problem that we face when viewing the construct, background music, in such a simple way, is that we ignore the potential real effect that this construct has on the dependent variable; in this case, task performance. Known as mono-operation bias, this becomes a threat to the construct validity of the measurement procedure we used. We are potentially under-representing the construct we are trying to measure [see the article: Construct validity]. Indeed, if we had taken into account the different aspects of the construct, background music, including the type of background noise, the loudness of the background music, and so on, we would have a much more accurate understanding on the relationship between construct A (i.e., background music, the independent variable), and construct B (i.e., task performance, the dependent variable). This would have improved the construct validity of our measurement procedure.

However, the use of a single measure also becomes a threat to external validity because we have to be extremely careful when making generalisations about the behaviour of a given construct when only using a single measure. In other words, we have to be careful about making generalisations about the effect of the construct A, background music, on construct B, task performance, because we are talking about the effect of background music on task performance in general. We are not claiming that background music only increases task performance when a specific type of background music is placed (e.g., easy listening), and a specific loudness of music (e.g., a medium volume). We were making generalisations from our results about background music in general; in other words, all types of background music, and all volumes of background music. When we make such generalisations without using multiple measures, we are, in effect, making generalisations across measures for a given construct. This should be done with considerable care, recognising the potential threat to external validity that arises from making generalisations using a single measure of a given construct.