External validity | Lærd Dissertation

Generalisations across populations, treatments, settings/contexts, and time

Sometimes in quantitative research, we do not only want to make generalisations from our sample to the immediate population that we are studying, but across populations, as well as other treatments, settings/contexts and times. We explain these types of generalisations briefly below:

Generalisations across populations

Whilst we try to make generalisations from the sample to the immediate population in quantitative research, sometimes the goal is to make generalisations across populations. For example, in Study #1 we were interested in undergraduate students at a single university in the United Kingdom. If we wanted to make generalisations across populations, we might ask ourselves: Would the new teaching methods be as effective amongst postgraduate students as undergraduate students at the university?

To answer this question, we need to ask ourselves: How similar are the characteristics of the immediate population to the population(s) that you want to make generalisations to? In Study #1, our population was undergraduate students, so we need to know whether the characteristics of postgraduate students are sufficiently similar to generalise our results to this wider population.

A study is only likely to look at certain characteristics of a population; that is, it will not necessarily look for every difference in the relationships studied (usually between two variables) across sample characteristics (e.g., age, gender, attitudes, personality, etc.). However, it may be differences in these sampling characteristics that limit the generalizability of results to a wider population.

Generalisations across treatments

Experimental and quasi-experimental research designs involve specific treatments. By treatments, we are referring to the interventions that are made in experiments. For example, in Study #1, we gave the 50 students in the control group two1-hour long lectures each week for 12 weeks, whilst we gave the 50 students in the treatment group the same two 1-hour lectures each week, but also one 1-hour seminar each week for the 12 week period. Therefore, the characteristics of the treatment in this experiment include the number of lectures (i.e., 24 lectures), the number of seminar classes (i.e., 12 classes), the interval between each of these lectures/seminars (i.e., 1 week), the length of the lectures (i.e., 1 hour each) and seminars (i.e., 1 hour each), and the time period of the experiment (i.e., 12 weeks).

The question arises: Do the treatment characteristics have to be the same when applied to different populations or settings/contexts to arrive at the same conclusions? In other words, would the results from Study #1 be significantly different if the characteristics of the treatment were altered? By significantly, we mean that the results are sufficiently different such that we cannot make the same conclusions about studies where the characteristics of the treatment are different.

Going back to Study #1, what if the control group and treatment group were given the same amount of learning time? After all, both groups receive two 1-hour long lectures each week, but the treatment group receives an additional 1-hour long seminar class. Therefore, what if the control group received three 1-hour long lectures each week whilst the treatment group only had two 1-hour lectures? Would Study #1 still show that seminar attendance increased exam performance? Similarly, what if we simply decided to cut the number of seminars in half? Or what if we extended the learning period from 12 weeks to 16 weeks?

The question arises: Why do we care about such differences in the characteristics of treatments? Let's go back to Study #1 again.

In Study #1, we only looked at a single university in the United Kingdom, but imagine that we wanted to make generalisations to a wider population such as all universities in the United Kingdom. Clearly, not all universities in the United Kingdom only provide 1-hour long lectures and 1-hour long seminars. Some use 2-hour lectures and 45 minute seminars (amongst other combinations). We need to ask ourselves: Would the new teaching method be as effective if the lectures and/or seminars were longer or shorter? We can only make generalisations across treatments if the answer to this question is YES. After all, if the answer is NO, the conclusions from our study cannot be generalised across treatments; that is, our conclusions are not externally valid across treatments.

Generalisations across settings/contexts

At the undergraduate and master's dissertation level, quantitative research typically focuses on a single setting/context, or a small number of settings/contexts. This is often done to control for potential extraneous/confounding variables [see the article: Extraneous and confounding variables], or to reduce research time and costs. The question arises: Would the same result have been found in a different setting/context?

To examine the external validity of a study, researchers often look to carry out the same experiment in different settings. This may mean carrying out the experiment in different organisational types, industries, countries, cultures, and so forth.

If we wanted to make generalisations from the results in Study #1 to another setting/context, we might ask ourselves the following questions: Would the new teaching method be as effective in Australia or the United States as it was in the United Kingdom? Would the new teaching method be as effective if taught online (e.g., through live streaming of lectures and group videoconferencing for seminars) rather than in a traditional, physical setting?

Making generalisations to different settings/contexts from the one that was studied must be done with care, especially if the settings/contexts that you are trying to make generalisations to have few similarities to the characteristics of your study; characteristics such as the nature of the population you were interested in, the physical environment and location, the cultural setting, and so forth. After all, it may be that differences in the cultural setting where the research took place mediates the relationships or differences that you have discovered in your study. For example, going back to Study #1, we might ask ourselves the question: Would the new teaching method be as effective in universities based in more collectivist cultures (e.g., China) compared to more individualistic cultures (e.g., the United Kingdom)?

When in doubt, it is much more prudent to propose such generalisations, support the rationale for such generalisations with theory, and then include them within the Future Research section of your Conclusions chapter (usually Chapter Five: Discussion/Conclusions) of your dissertation.

Generalisations across time

With the exception of longitudinal studies, which are rarely conducted at the undergraduate and master's dissertation level, the results from quantitative research tend to reflect a snapshot in time. By a snapshot in time, we mean that most experiments (a) are conducted within a specific time period (e.g., the 12 weeks in Study #1), and (b) take measurements that are time-dependent; that is, obtain data that could only be collected within that time period (e.g., the exam scores in Study #1 reflected the students' ability at that point in time). Therefore, when we conclude that seminar attendance (i.e., one level of the independent variable) increases (i.e., causes an increase in) exam performance (i.e., the dependent variable), this reflects the lectures and seminars that were given during a 12 week period, and the exam performance of students at the end of that period.

The question arises: Would the results hold over time? In other words, if we conducted this experiment at some point in the future (e.g., in 5 years from now), would we get the same results? If we feel that the answer is YES, perhaps because we imagine teaching methods and student ability to be fairly constant over the next 5 years, we could argue that our results are generalizable across time.

However, time affects experimental conditions in different ways, which determines whether generalisations can be made. For example, studies that focus on culture at the national level (e.g., the Chinese culture, German culture, etc.) are more likely to be externally valid over time than studies of culture in a single organisation. This is because national cultures often change very little, and when they do, such change tends to take place over decades. By comparison, even an organisation with a strong culture could witness a relatively rapid change (e.g., months or a few years) if it were acquired by another organisation with a vastly different culture (e.g., an organisation with a power culture taking over an organisation with a people culture).

When making generalisations across time, care must be taken to assess whether the population, treatments and/or settings/contexts are likely to be prone to change over the time period you want to make generalisations to. For example, you are unlikely to make generalisations over all time, but rather a particular time period (e.g., a number of months, a few years, or perhaps even decades).

Further considerations

When deciding whether a study is externally valid, it is the extent to which such generalisations can be made that is important. No study can be completely externally valid. To assess the extent to which generalisations can be made, we have to determine how well the sample (or population) that we have studied represents the wider population (or settings/contexts, treatments or time) we are interested in making generalisations to.

Of course, the extent to which we want out results to be robust across different units, settings, and treatments will vary according to the perspective we take towards research (i.e., a positivist versus a post-positivist research paradigm) [see the section, Research Paradigms, if you do not know the differences between positivism or post-positivism]. As positivists, we would have a greater desire to build grander theories and therefore make much broader generalisations from our results, whilst as post-positivists, we do not have such a desire to build grand theories, but we still want our results to be robust to wider populations, settings, and treatments.

We may think that we can make certain generalisations from our results, but we cannot say with certainty that this take place. This reflects the fact that there are many threats to external validity that can undermine our results, which are discussed in the next section [see the section: Threats to external validity]. It also reflects the fact that there are different types of quantitative research design (i.e., descriptive, experimental, quasi-experimental and relationship-based research designs), which can make us more or less confident that our conclusions are externally valid.