External validity | Lærd Dissertation

Generalisation and methods

Just as there are problems arising from making generalisations from a single measure, as discussed in the previous section, external validity can also be threatened when using a single method to measure a given construct. Known as mono-method bias, it threatens the construct validity of the measurement procedure you use [see the article: Construct validity]. Before we reflect back on the threat to external validity from using a single method, let's look at the problems that can arise from using a single method, based on our example in Study #2:

Study #2
The relationship between background music and task performance amongst employees at a packing facility

To briefly recap, our study examined the relationship between background music (i.e., construct A) and task performance (i.e., construct B) amongst employees at a packing facility (e.g., Amazon, Wal-Mart, Tesco, etc.). The independent variable was background music (i.e., construct A), whilst the dependent variable was task performance (i.e., construct B). One group of employees listened to background music whilst working (i.e., the treatment group), whereas the other group were not provided with any background music (i.e., the control group). The task performance of employees was measured in terms of the number of tasks employees perform correctly per hour. The method used to listen to background music was a loud speaker (i.e., stereo system), whilst task performance was measured using an e-packing system, which automatically collected data on the number of tasks correctly performed by employees.

Let's imagine some of the multiple methods that could be used to measure the two constructs (i.e., construct A, background music, and construct B, task performance):

Independent variable
Method #1: Listening to music through the loud speaker (i.e., stereo system)
Method #2: Listening to music using a personal iPod and headphones

Dependent variable
Method #1: Data automatically collected through the e-packing system
Method #2: Supervisor rating the speed of the packer

Note that sometimes a mono-method (i.e., a single method) is appropriate. For example, Method #1 for the dependent variable (i.e., data being automatically collected through the e-packing system, recording task performance accurately) may be the most accurate measure of task performance in this piece of research. After all, Method #1, where the supervisor rates the speed of the packer is more likely to result from experimenter bias or instrumental bias than an automated system that does not suffer from such bias. However, this is often not the case, and the use of multiple methods reduces the threat to construct validity and external validity.

When considering mono-method bias in your dissertation, you need to ask yourself:

Would the same results have been recorded if the independent variable, background noise, had been operationalized using a different method; in this case, using Method #2 (i.e., listing to music using a personal iPod and headphones) rather than Method #2 (i.e., listing to music through the loud speaker/stereo system)?
Would the use of multiple methods have provided greater insight into the construct than just a single method; that is, would have multiple methods reduced the potential for method bias to affect the scores on the dependent variable?

If the measurement procedure consists of a single method to assess the independent and/or dependent variables, this can act as a threat to construct validity. This is because the method used may introduce bias, changing the scores on the independent or dependent variable. In addition, the use of a single method becomes a threat to external validity because we may be making generalisations from our results that are really only accurate when using a specific type of method.

The 'real world' versus the 'experimental world' (and external validity)

When making generalisations, whether to a wider population, or across populations, treatments, contexts/settings, or time, we are making generalisations from the experimental world to the real world. After all, irrespective of the quantitative research design that you use (i.e., descriptive, experimental, quasi-experimental or relationship-based research design), whenever participants know that they are taking part in research (i.e., experiments), there is the potential for that experimental world to influence the research findings (i.e., dependent variable) rather than the independent variables.

This raises a broad threat to external validity; that is, can we make generalisations from individuals that have experienced treatments (i.e., took part in the experiment) to people in the real world that have not experienced the same treatments (i.e., people who were not part of the experiment)?

In answering this question, there are three broad effects that may threaten the external validity of your results: testing effects, experimental effects and experimenter effects. Each is discussed in turn:

Testing effects

Testing effects, also known as order effects, multiple treatment interference, and reactive or interaction effects of testing, only occur in experimental and quasi-experimental research designs that have more than one stage; that is, research designs that involve a pre-test and a post-test. In such circumstances, the fact that the person taking part in the research is tested more than once can influence their behaviour/scores in the post-test, which confounds the results; that is, the differences in scores on the dependent variable between the groups being studied may be due to testing effects rather than the independent variable. Some of the reasons why testing effects occur include learning effects (practice or carry-over effects) and experimental fatigue. Each is discussed in turn:

Learning effects (practice or carry-over effects)
Learning effects, also known as practice effects or carry-over effects result in increased post-test performance (i.e., higher scores on the dependent variable) because participants have become familiar with some aspect of the experiment (e.g., its subject matter) from the pre-test. As a result of these learning effects, during the post-test, participants may:
- Understand the format of the experiment
- Understand the purpose of the experiment
- Become familiar with the testing environment
- Develop a strategy/approach to do better/worse in the experiment (or moderate their outcome)
- Become less anxious about the experiment
Where learning effects relate to the measurement procedure (e.g., a, b and c above), this is often called habituation. Where such learning effects relate to memory effects (e.g., d and e above), this is often called sensitization.

Experimental fatigue / General experiences during the experiment

Experimental fatigue reflects general experiences that take place during the experiment that lead to physical and/or mental fatigue. This could be due to a particular treatment, which may be physical and/or mentally demanding, or simply due to the fact that being part of a research project, which is unusual for most participants, can be tiring.

Testing effects are not a problem in all studies. For example, as a 'general rule of thumb', testing effects are less likely to be a threat to external validity where there has been a large time period between the pre-test and post-test compared with experiments having a short interval between tests. You need to ask yourself: To what extent are learning effects a problem for the post-test in my experiment?

Experimental effects

Participants can behave differently when they are taking part in research compared to the way that they would behave in everyday life. Some of these differences in behaviour result from subject effects/novelty effect, compensatory rivalry, demoralization and compensation. Each is briefly discussed in turn:

Subject effects / novelty effect
Subject effects (or participant reactivity) occur when the way that participants behave in an experiment is different from the way that they would normally behave. These changes in behaviour reflect participants? knowledge that they are being studied, which may lead to them acting aggressively/defensively, cooperatively/uncooperatively, or in some other way that affects their score on the dependent variable. Participants may behave differently in order to mirror the behaviour that they think the researcher wants to see, or they may do it for their own reasons.
People may also respond differently when taking part in research compared with everyday life because of the novelty effect; the idea that research is a novel experience for most people, which will influence their behaviour. People may be excited by the opportunity to take part in research, or taking part may create anxiety or fear.
Irrespective of the reasons why people change their behaviour, such behavioural modification can threaten the internal validity of the study because the way that participants reacted may explain the changes in the dependent variable rather than the treatment (i.e., the independent variable). At the same time, it threatens the external validity of the study because we cannot make generalisations to real life situations from experimental results that poorly reflect real life because of such subject and novelty effects.

Compensatory rivalry
In experimental and quasi-experiment research designs where there is a treatment and control group, participants can sometimes become competitive when not included in the treatment group. As a result, they exert additional effort, which may improve the score on the dependent variable for the control group compared with normal conditions (i.e., compared with what is typical or expected for such a group). This can even happen when there are two treatment groups and no control group, so long as one of the groups is receiving a less attractive treatments/intervention. It is known as compensatory rivalry (or compensatory equalization of treatments), not only threatening the internal validity of your findings, but also their external validity.

Demoralization
Demoralization (or resentful demoralization; and in some cases, compensatory demoralization) can happen in experimental research when participants are assigned to the control group rather than the treatment group. This is not always the case, especially where there are no negative outcomes associated with control group membership. However, there are instances where being assigned to the control group can be viewed to be negative, leading to feelings of anger, demoralization, resentment, neglect, amongst other negative feelings. This can affect the scores on the dependent variable for the control group, reducing the internal validity of the finding, and therefore, the ability to make generalisations from these findings.

Compensation
Sometimes you will choose (or need) to compensate participants to encourage them to take part in your research. Broadly, such compensation can be viewed as either general compensation or control group compensation:
- General compensation
  All participants are rewarded simply for taking part, irrespective of whether they are in the control group or the treatment group. For example, study participants may be given money, physical items (e.g., clothes; iPod, etc.), or some other form of compensation if they complete an online survey or take part in a physiological or psychological experiment.
- Control group compensation
  The control group sometimes misses out on a treatment, which leads to the demoralization (or resentful demoralization; and in some cases, compensatory demoralization) that we discussed in the previous section [see the section: Demoralization and internal validity]. This can lead to a concern amongst researchers, whether because of the threat to internal validity that this causes, or a general sympathy with the control group members. When this happens, researchers can feel pressured to provide the control group with compensation that the treatment group does not receive. This may be general compensation for taking part, as discussed above, or it may be another form of compensation. However, the problem with this form of one-sided compensation is that the control group is no longer a control group in the true sense of the word because they have still been compensated in some way. This could affect the differences in the scores on the dependent variable between the control group and treatment group, reducing the internal validity of the study.
Ultimately, any form of compensation, which would not likely be given in the real world, threatens your ability to make generalisations from your results; that is, it threatens the external validity of your study.

Experimenter effects

Just as experimenter characteristics can threaten the internal validity of your research, they can also threaten its external validity. An experimenter effect, which results in experimenter bias, can threaten external validity across all types of experimental and quasi-experimental research design. Such an experimenter effect is typically unintentional, but arises because of (a) the personal characteristics of the researcher, which influences the choices made during a study; and (b) non-verbal cues that the researcher gives out that may influence the behaviour and responses of participants. Some of the more generic personal characteristics that may lead to bias include the experimenter's age, class, gender, race, and so forth.

Furthermore, in quantitative research, you often make predictions about the outcome of an experiment. These predictions may come in the form of directional hypotheses. We call something a directional hypothesis, rather than a non-directional hypothesis, because we making a prediction about the outcome of an experiment. For example: As physical activity increases, risk of heart disease decreases; As pay increases, employee motivation increases [see the section on Research (and null) hypotheses].

Seldom will you design an experiment thinking that nothing will happen, or having no idea about the potential outcome. For example, we think that a new teaching method will improve student exam performance, so we design an experiment to find out if this is the case; we think that introducing background music into a packing facility will increase employee task performance, so we design an experiment to test our directional hypothesis.

Since you, as the experimenter, may make such predictions, it is possible that certain personal biases will enter the research process. These personal biases are often exhibited in the experimenter's behaviour, which may include being more/less helpful/friendly/informative towards the different groups involved in the study in order to influence their behaviour. Whilst this may be an unconscious form of bias, it can lead to changes in the dependent variable that are not due only to the treatment (i.e., the independent variable), but also experimenter effects. If the measurement of the dependent variable is more qualitative, this may pose a more significant threat to internal validity (e.g., the experimenter makes the judgement of a student's performance on a scale of 1-10 instead of this measurement being less subjective, such as using a measurement device like a written test or behavioural scale).

Experimenter bias becomes a threat to external validity because the results that are obtained in a given study may simply reflect the personal biases of the researcher. If another researcher were to carry out the same study using a sample with very similar characteristics and the same research methods, different results may be obtained. Therefore, our ability to generalise from the results of a study that is subject to experimenter bias is threatened.