In quantitative research, the concept of **external validity** is important because we want to be able to say that the conclusions we made in our dissertation can be **generalised**. We may want to make **generalisations (a)** to a **wider population**, and/or **(b) across populations**, **treatments**, **settings/contexts** and **time**. After all, in quantitative research, the results that we obtain are based solely on a **sample** (e.g., of 100 employees in a bank, 350 children in a school district, 15 charities in the UK, etc.). However, this **sample** comes from a wider **population** that we are interested in (e.g., all 8,000 employees in the bank, all 100,000 children in the school district, all 3,500 charities in the UK, etc.). Sometimes, we are only interested in **how well** the sample can be generalised to the population it was drawn from. However, we may also be interested in knowing if our results can be generalised across populations, treatments, settings/contexts, and time. External validity asks the question: To what extent can our conclusions be generalised **(a)** to a wider population, and/or **(b)** across populations, treatments, settings/contexts, and time?

Before trying to answer this question, it is worth noting that external validity is something that can affect dissertations that are guided by a **quantitative**, **qualitative** or **mixed methods research design** [see the section of Research Designs if you are unsure which research design your dissertation follows]. However, if your dissertation was guided by a **qualitative research design**, the idea of external validity is often referred to as **transferability**, and whilst similar to external validity, is not the same.

In **quantitative research designs**, the level of external validity will be affected by **(a)** the **type** of quantitative research design you adopted (i.e., **descriptive**, **experimental**, **quasi-experimental** or **relationship-based** research designs), and **(b)** potential **threats to external validity** that may have influenced your ability to make generalisations. In this article, we **(a)** explain what external validity is, and **(b)** discuss and provide examples of the various threats to external validity.

Irrespective of the type of **quantitative research design** we choose to use in research (i.e., **descriptive**, **experimental**, **quasi-experimental** or **relationship-based** research designs), we make **knowledge claims** based on our results (e.g., exercise reduces heart disease, redundancies lower employee motivation, etc.). For example, if an experimental study found that students that turn up to seminars in addition to lectures get better marks than those student that only turn up to lectures, we may argue that **seminar attendance** (i.e., the **independent variable**) increases (i.e., **causes** an increase in) **exam performance** (i.e., the **dependent variable**) [see the section on Research Designs if you are unsure about the different types of quantitative research design]. However, the question arises: Do the **relationships** or **differences** we find in our research **hold** over different **populations**, **treatments**, **settings/contexts** and **time**?

Before addressing this question, let's remember that one of the main goals of quantitative research is to be able to make **generalisations** from the results of a single study. It is important to recognise that we are making generalisations from the **observed** (i.e., the sample we studied) to things that are **unobserved** (i.e., a wider population that has not been measured); that is, the 'real world'. The question arises: What do we make generalisations to? In answering this question, we can think of two types of generalisations (Cook and Campbell, 1979):

Generalisations to the population

Generalisations across populations, treatments, settings/contexts, and time

In order to explain these two types of generalisations, we use **Study #1** below to provide some background:

Study #1:

The impact of teaching method on exam performance

We want to examine how two different **teaching methods** (i.e., the independent variable) affect the **exam performance** (i.e., the dependent variable) of university students. More specifically, we want to know if the addition of a seminar class to traditional lecturing improves exam performance, and if so, by how much. This is important because the university only has a limited budget, so it would not want to add seminar classes to lectures if students? exam performance was not significantly improved as a result. The course in question is **Research Methods 101**.

Students took an exam at the beginning of the course (i.e., the **pre-test**) to determine their general aptitude for the subject matter (i.e., their natural ability in **Research Methods**). This was done to ensure that the two groups being investigated (i.e., the control group and treatment group) were more or less equal in terms of natural ability. Each group consisted of 50 students who were **randomly assigned** to their respective groups. For the next 12 weeks (i.e., the duration of the course), the **control group** were given the "normal" teaching method, which consisted of two 1-hour long lectures each week. During this same period, the **treatment group** were given the same two 1-hour lectures each week, but also attended one 1-hour seminar. At the end of the 12 weeks, the students from the control group and the treatment group would be given the same **Research Methods 101** exam (i.e., the **post-test**). The goal of the experiment was to compare the differences in the scores on the dependent variable (i.e., exam performance) between the two groups (i.e., the control and treatment groups).

When the pre- and post-test scores of the control group (i.e., lectures only) and treatment group (i.e., lectures and seminars) are compared, the results suggest that students who received lectures and seminars (i.e., the treatment group) outperformed the students who only received lectures (i.e., the control group) by an average (i.e., **mean**) of 6.3% (out of 100%). As such, we may argue that **seminar attendance** (i.e., one **level** of the independent variable) increases (i.e., **causes** an increase in) exam performance (i.e., the dependent variable).

Drawing on **Study #1** above, let's imagine how we may want to make generalisations from these results.

When we study a **sample** of a , the immediate task is often to **analyse** the **data** that we have collected from that **sample**. In the case of the 100 students that make up our sample in Study #1 above (i.e., 50 students in the treatment group and 50 students in the control group), we would analyse the data that we had collected on their exam performance.**population**

This tells us something interesting about the **sample**, but we actually want to know about the **population**, not just the sample. We want to know if our **findings **from the **sample** can be **generalised** to the **population**. In Study #1, the **population** we were examining was the **undergraduate students** at a single university in the United Kingdom, which had around 10,000 undergraduate students in total.

Our study is **externally valid** if we can be **confident** that the conclusions we made (i.e., that **seminar attendance increases exam performance**) hold to the 10,000 undergraduate students at the university, not just the 100 students involved in the sample. To have confidence that this was the case, we would need to ensure that our sample **closely mirrored** the population that was being studied; that is, confidence that our sample shared **similar characteristics** to the population, such as the ratio of males to females, the age of students, the subjects that they studied, and so forth.