External validity | Lærd Dissertation

Time and external validity

Time affects our ability to make generalisations. When making generalisations that involve time, we need to not only think about the threats to external validity that arise from making generalisations across time, but also the fact that time is a part of the treatment (i.e., intervention) within quantitative research (e.g., a 15 week teaching period versus a 3 year teaching period). We need to remember that the relationships and differences that we find in research reflect a snapshot in time; that is, we mean that experiments (a) are conducted within a specific time period (e.g., the 12 weeks in Study #1), and (b) take measurements that are time-dependent; that is, obtain data that could only be collected within that time period (e.g., the exam scores in Study #1 reflected the students' ability at that point in time). Things change over time, whether due to events (e.g., nuclear safety, wars, lack of sleep; i.e., an interruption to the treatment, etc.) or more fundamental social and cultural changes (e.g., migration, governmental change, i.e., autocratic to democratic, economic prospects, regulatory and legal changes, etc.). The question arises: Would the same result have been found in the future or if the treatment involved a longer or shorter time period?

In this section, we look at the impact of history effects and maturation on external validity. History effects and maturation affect the external validity of a study because (a) when an experiment takes place, (b) the length of the interval between a pre-test and post-test, (c) the overall period that an experiment goes on for, and (d) the impact of time on a given sample are often not the same between studies. As a result, we have to question whether the results we obtain in our study can be generalised to other studies that have such differences. In the sections that follow, we look at history effects and maturation in turn:

History effects

History effects refer to events that happen in the environment that change the conditions of a study, affecting its outcome. Such a history event can happen before the start of an experiment, or between the pre-test and post-test. History events are typically framed as threats to internal validity rather than external validity. They affect the outcome of an experiment in a way that threatens its internal validity when the history effect (a) changes the scores on the independent and dependent variables, and (b) changes the scores of one group more than another (e.g., increase the scores of the treatment group compared with the control group or a second treatment group) [see the section, History effects and internal validity in the article, Internal validity, for a detailed explanation of how history effects threaten internal validity]. Take the following example of a threat to internal validity, which will help us to explain how history effects become threats to external validity:

Study #1
The impact of exercise (i.e., fitness level) on how well people sleep (i.e., sleep quality)
NOTE: For the purposes of this example, let's imagine that the sample of people investigated were students living in halls of residence.

Event: A student burns some toast after a drunken night out, which sets off the fire alarm in the halls of residence in the middle of the night. The fire service are called out, there are loud sirens going off, all the students have to go outside, and it takes 2 hours before the students can go back into the build to return to sleep. Amongst those students living in the halls of residence that night are the participants taking part in a study investigating the impact of exercise on how well people sleep (i.e., their sleep quality). The pre-test for the study starts the next morning.

The experiment: The students taking place in the study arrive at the sports science labs in the morning to complete their first exercise test to show their level of fitness, and complete a questionnaire about their level of sleep quality. Participants are divided into two groups: (a) a control group that receives no intervention, meaning that the participants simply carry on their normal lives until the post-test measurement is taken; and (b) a treatment group that receives an intervention consisting of 6 weeks of personal training to help participants improve their fitness levels. The results from these two measurement procedures (i.e., the exercise test and sleep questionnaire) provide the data for the pre-test of the experiment.

History effects: Since the participants suffered from a lack of sleep the night before, including an interruption to their sleep patterns, due to the fire alarm, there is the potential for exercise/fitness scores (i.e., the independent variable) and the sleep quality scores (i.e., the dependent variable) to be lower than they would be normally. Therefore, when the post-test is conducted, and the students complete their second exercise test and complete their second sleep quality questionnaire, the difference between the two scores (i.e., the pre-test and post-test scores) are likely to be greater than they would be normally. Let's imagine that when comparing the difference between the pre-test and post-test scores, we were able to conclude that an increase in exercise improved sleep quality, we cannot be sure that this result was due to the increase in exercise and not the fact that the pre-test scores were lower than normal due to the fire alarm, which left participants more tired and physically fatigued that usual. This event, which occurred prior to the study taking place, acted as a history effect that threatened the internal validity of the study.

Whilst this example illustrates how a history effect can become a threat to internal validity, it also acts as a threat to external validity. In experimental research, we want to be able to make generalisations from our results to some wider population. For example, we want to be able to say that an increase in exercise improves sleep quality. When we say this, we want not only to be referring to the experiment we conduct amongst a small group of students, but a much wider population (e.g., all students; or perhaps all people that share the same characteristics as our sample). However, how can we be sure that the results would be the same in future studies where such a history event did not occur?

Maturation

If the experiment in your dissertation focuses on people (i.e., people are the population you are interested in), maturation is likely to threaten the internal validity of your findings. This has to do with time and the effect that time has on people. After all, experiments do not happen overnight, but often over a period of time, whether days, weeks, a few months, or in some cases, years. Whilst experiments at the undergraduate and master's dissertation level tend to last no longer than 2-3 months (at least the data collection phase), there are a number of changes that can take place within such short timeframes. During such periods of time, people change, and such change can affect your findings. This is the case for all types of experiment, whether in the physical or social sciences, psychology, management, education, or another field of study. Let's look at some examples of maturation effects in the short-term and long-term:

Short-term changes and their effects
There are a number of maturation effects that can occur during the very short term; that is, within a few hours or days. People's behaviour can change. For example, they can go from being in a good mood or a bad one. Factors such as participant tiredness, boredom, hunger and inattention can also occur. These factors can be driven by the research participant or the experiment. The participant may have stayed up late the night before an experiment, causing tiredness; the participant may be thinking about an upcoming coursework deadline or exam, causing inattention; and so forth. Such participant-led factors can be difficult to control, reducing the internal validity of an experiment. However, sometimes these factors (i.e., tiredness, boredom, hunger, inattention, etc.) are the result of the experiment.

Longer-term changes and their effects
Other maturation effects can result from longer term changes, such as getting older, becoming better educated, become more affluent, and so forth. However, even within experiments lasting less than a year, and perhaps even just a few months, it is possible for these factors to affect your findings. For example, people can get a new job with a relatively significant pay rise, or they may come into some inheritance money. They may start taking some form of further education, whether within the classroom, at home, or in work. At the same time, getting older can be an issue. Indeed, experiments that focus on people that are elderly, as well as those that involve young children have the potential to suffer from maturation effects because small changes in age can have a particularly marked impact on a range of physical, social, behavioural, and psychological factors. For example, as people become elderly, there can be a more rapid deterioration in certain physical characteristics such as vision, hearing, taste, and even memory. This may negatively impact their performance during an experiment. Amongst young children, there is a greater propensity for learning to take place (acquiring new knowledge and skills), as well as becoming stronger, stronger, and tasting in a short space of time. Such maturation effects, in addition to (or rather than) the treatment condition, may change the performance of participants in the post-test relative to the pre-test.

The question arises: How confident are you that the observed changes in the dependent variable are due to the treatment (i.e., intervention) and not maturation? In principle, such confidence will decrease as the experiment goes on. However, it is not as simple as saying that the longer an experiment, the greater the potential maturation effect. You need to look at the nature of your research, and examine whether maturation is likely to be a problem. At the same time, maturation does not only affect the internal validity of your findings, but can also threaten the external validity of your study. This not only reflects the challenges of making generalisations across time, but also generalisation across treatments. To imagine some of the potential problems, let's think about to Study #1:

Study #1:
The impact of teaching method on exam performance

In this study, we provided the treatment group with seminars and lectures (and the control group with just lectures) for a period of 15 weeks. However, what if we had conducted this experiment over a period of 3 to 4 years, the length of a typical undergraduate degree? On the one hand, the results may have been less exposed to short-term maturation effects (e.g., participant tiredness, boredom, hunger and inattention). However, we would have been more exposed to longer-term maturation effects (e.g., participants being better educated, more skilled at learning, arguably more socially/emotionally mature, etc.).

If the longer-term maturation affects highlighted above affected the impact that the two different teaching methods had on exam performance, this would question our ability to make generalisations from the findings across time. For example, we could not argue that seminar attendance increased exam performance irrespective of the amount of time that students received seminar classes in addition to lectures. Perhaps the increase in performance is greatest over the long-term; or perhaps it has a greater increase in the short-term, followed by a smaller increase over the long-term. Since we did not examine the impact of teaching method on exam performance across different time periods, we have to be careful when making generalisations from what is, in effect, a snapshot in time, to situations across time.