External validity

In quantitative research, the concept of external validity is important because we want to be able to say that the conclusions we made in our dissertation can be generalised. We may want to make generalisations (a) to a wider population, and/or (b) across populations, treatments, settings/contexts and time. After all, in quantitative research, the results that we obtain are based solely on a sample (e.g., of 100 employees in a bank, 350 children in a school district, 15 charities in the UK, etc.). However, this sample comes from a wider population that we are interested in (e.g., all 8,000 employees in the bank, all 100,000 children in the school district, all 3,500 charities in the UK, etc.). Sometimes, we are only interested in how well the sample can be generalised to the population it was drawn from. However, we may also be interested in knowing if our results can be generalised across populations, treatments, settings/contexts, and time. External validity asks the question: To what extent can our conclusions be generalised (a) to a wider population, and/or (b) across populations, treatments, settings/contexts, and time?

Before trying to answer this question, it is worth noting that external validity is something that can affect dissertations that are guided by a quantitative, qualitative or mixed methods research design [see the section of Research Designs if you are unsure which research design your dissertation follows]. However, if your dissertation was guided by a qualitative research design, the idea of external validity is often referred to as transferability, and whilst similar to external validity, is not the same.

In quantitative research designs, the level of external validity will be affected by (a) the type of quantitative research design you adopted (i.e., descriptive, experimental, quasi-experimental or relationship-based research designs), and (b) potential threats to external validity that may have influenced your ability to make generalisations. In this article, we (a) explain what external validity is, and (b) discuss and provide examples of the various threats to external validity.

What is external validity?

Irrespective of the type of quantitative research design we choose to use in research (i.e., descriptive, experimental, quasi-experimental or relationship-based research designs), we make knowledge claims based on our results (e.g., exercise reduces heart disease, redundancies lower employee motivation, etc.). For example, if an experimental study found that students that turn up to seminars in addition to lectures get better marks than those student that only turn up to lectures, we may argue that seminar attendance (i.e., the independent variable) increases (i.e., causes an increase in) exam performance (i.e., the dependent variable) [see the section on Research Designs if you are unsure about the different types of quantitative research design]. However, the question arises: Do the relationships or differences we find in our research hold over different populations, treatments, settings/contexts and time?

Before addressing this question, let's remember that one of the main goals of quantitative research is to be able to make generalisations from the results of a single study. It is important to recognise that we are making generalisations from the observed (i.e., the sample we studied) to things that are unobserved (i.e., a wider population that has not been measured); that is, the 'real world'. The question arises: What do we make generalisations to? In answering this question, we can think of two types of generalisations (Cook and Campbell, 1979):

Generalisations to the population
Generalisations across populations, treatments, settings/contexts, and time

In order to explain these two types of generalisations, we use Study #1 below to provide some background:

Study #1:
The impact of teaching method on exam performance

We want to examine how two different teaching methods (i.e., the independent variable) affect the exam performance (i.e., the dependent variable) of university students. More specifically, we want to know if the addition of a seminar class to traditional lecturing improves exam performance, and if so, by how much. This is important because the university only has a limited budget, so it would not want to add seminar classes to lectures if students? exam performance was not significantly improved as a result. The course in question is Research Methods 101.

Students took an exam at the beginning of the course (i.e., the pre-test) to determine their general aptitude for the subject matter (i.e., their natural ability in Research Methods). This was done to ensure that the two groups being investigated (i.e., the control group and treatment group) were more or less equal in terms of natural ability. Each group consisted of 50 students who were randomly assigned to their respective groups. For the next 12 weeks (i.e., the duration of the course), the control group were given the "normal" teaching method, which consisted of two 1-hour long lectures each week. During this same period, the treatment group were given the same two 1-hour lectures each week, but also attended one 1-hour seminar. At the end of the 12 weeks, the students from the control group and the treatment group would be given the same Research Methods 101 exam (i.e., the post-test). The goal of the experiment was to compare the differences in the scores on the dependent variable (i.e., exam performance) between the two groups (i.e., the control and treatment groups).

When the pre- and post-test scores of the control group (i.e., lectures only) and treatment group (i.e., lectures and seminars) are compared, the results suggest that students who received lectures and seminars (i.e., the treatment group) outperformed the students who only received lectures (i.e., the control group) by an average (i.e., mean) of 6.3% (out of 100%). As such, we may argue that seminar attendance (i.e., one level of the independent variable) increases (i.e., causes an increase in) exam performance (i.e., the dependent variable).

Drawing on Study #1 above, let's imagine how we may want to make generalisations from these results.

Generalisations to the population

When we study a sample of a , the immediate task is often to analyse the data that we have collected from that sample. In the case of the 100 students that make up our sample in Study #1 above (i.e., 50 students in the treatment group and 50 students in the control group), we would analyse the data that we had collected on their exam performance.population

This tells us something interesting about the sample, but we actually want to know about the population, not just the sample. We want to know if our findings from the sample can be generalised to the population. In Study #1, the population we were examining was the undergraduate students at a single university in the United Kingdom, which had around 10,000 undergraduate students in total.

Our study is externally valid if we can be confident that the conclusions we made (i.e., that seminar attendance increases exam performance) hold to the 10,000 undergraduate students at the university, not just the 100 students involved in the sample. To have confidence that this was the case, we would need to ensure that our sample closely mirrored the population that was being studied; that is, confidence that our sample shared similar characteristics to the population, such as the ratio of males to females, the age of students, the subjects that they studied, and so forth.