Internal validity | Lærd Dissertation

Statistical regression and internal validity

Statistical regression (or regression towards the mean) can be a threat to internal validity because the scores of individuals on the dependent variable may not only be the due to the natural performance of those individuals, but also measurement errors (or chance). When these scores are particularly high or low (i.e., they are extreme scores), there is a tendency for these scores to move (i.e., regress) towards the mean (i.e., the average score): so an individual with an extremely high score during the pre-test measurement of an experiment gets a lower score on the post-test measurement; and vice versa, with an individual with an extremely low score during the pre-test measurement getting a higher score on the post-test measurement. Before we explain when this is a problem, let's look at the following example:

Multiple choice tests
Imagine that a student takes a multiple choice exam on five consecutive days. The exam is on the same subject, but the multiple choice questions are different each time. Also, the student did not do any more revision during those five days. Under this scenario, we would expect the student's exam score (i.e., the dependent variable) to be the function of two things: (a) stable factors: the student's natural performance; that is, the student's skill and knowledge; and (b) unstable factors: chance (or luck), since the student will get the correct answer on some of the multiple choice questions simply by guessing the correct answer. The student's natural performance is likely to remain stable for each of the measurements on the dependent variable (i.e., the total mark that the student gets on each of the five exams), but the amount of luck that the student has in guessing answers to those questions that she doesn't know the answer to is likely to be unstable (i.e., it may change quite a lot for each of the five exams). For the vast majority of students that take the exam, such changes in luck from one exam to the next (i.e., from one measurement to the next) are not a major concern, but for those individuals with extreme scores (i.e., particularly high or low scores), this can cause a problem.

We can see the problem that this causes in two scenarios: (a) experiments where we are only interested in examining participants whose scores on the dependent variable start with an extreme value; and (b) experiments comparing two groups where one group includes participants whose scores on the dependent variable start with an extreme value, whilst the other group does not.

Experiments where we are only interested in examining participants whose scores on the dependent variable start with an extreme value
For example, let's imagine that we want to examine the impact of two different teaching methods on exam performance amongst remedial maths students; in other words, we want to know which of two different teaching methods is better at improving the maths grades of students that are particularly bad at maths. However, when we focus on this group of participants (i.e., remedial maths students), they are the type of participants that have extreme scores (i.e., extremely low scores, in this case). This causes a problem when we compared the exam performance of the two different groups of students (i.e., one group experiences the same teaching method as normal, the control group, whilst the other group receives the new teaching method, the treatment group).

It is a problem because we do not know whether any differences in the scores on the dependent variable are due to the independent variable (i.e., the different teaching methods) or the unstable factors that are characteristic of participants with these kinds of extreme scores. Also, since students start with low scores, it is most likely that they will only improve. It is possible that the score are improving not because of the treatment (i.e., the new teaching method), but because the students' scores regress towards the mean (i.e., are higher in the post-test simply because of this mathematical phenomenon that is known as statistical regression). This becomes a threat to internal validity.
Experiments comparing two groups where one group includes participants whose scores on the dependent variable start with an extreme value, whilst the other group does not
For example, let's imagine that we want to examine the impact of two different teaching methods on exam performance between average and remedial maths students; in other words, want to know which of two different teaching methods is better at improving the maths grades of average students compared remedial students. Building on the previous scenario, we now encounter a new problem. On the one hand, we face the threat to internal validity that arises from the fact that the remedial group of students start with extreme scores on the dependent variable. We know from the scenario discuss above that this is a problem because we do not know whether any differences in the scores on the dependent variable are due to the independent variable (i.e., the different teaching methods) or the unstable factors that are characteristic of participants with these kinds of extreme scores. However, we now add another problem. We are comparing the scores on the dependent variable of a group that does not suffer from such extreme value bias (i.e., the group of average students) to one that does (i.e., the group of remedial students). Since one of the groups includes extreme scores on the dependent variable, there is the potential that the differences in scores on the dependent variable (i.e., exam performance) are going to be smaller than they would be normally because of statistical regression; in other words, whilst the exam scores of average group of students are likely to be relatively stable, the unstable nature of the exam scores for the remedial group of students means that these scores are likely to be higher in the post-test than they would normally be.

In assessing whether statistical regression may be a threat to internal validity in your study, ask yourself: Does the sample I am interested in include individuals that are likely to start with extreme scores on the dependent variable (i.e., compared with the norm)? If it does, you may need to take in account statistical regression when designing your study and analysing your results.

Selection biases and internal validity

As the saying goes, "No two people are the same". They differ along a wide range of factors, such in age, behaviour, gender, height, intelligence, and so forth. You cannot eliminate such individual differences from research, but you do need to take them into account when comparing different groups. It is important to reduce individual differences between groups where these individual differences are extraneous variables and systematically applied [see the article: Extraneous and confounding variables]. In experimental and quasi-experimental research, you need to make sure that the groups are equivalent before you start or there could be difference between the treatment and control groups (i.e., before any interventions are made), which may explain the differences in scores on the dependent variable. This is known as a selection effect, and it is a threat to the internal validity of your study.

If you want to use an experimental research design, one of the fundamental criteria is the random assignment of participants to the different groups that you are comparing. By random assignment, we mean that participants in the different groups that are being compared are similar across a range of general and specific criteria. Some of the more general criteria when randomly assigning participants to different groups include factors such as age and gender. However, there may also be specific criteria that you want to take account of, which will depend on the nature of the research you are performing.

Study #2 recap
For example, if you were interested in the impact of two different teaching methods, namely students receiving lectures and seminar classes compared to students receiving lectures only (i.e., your independent variable) on the exam performance of students (i.e., your dependent variable), you may also want to ensure that the lecturers/teachers involved in the study had a similar educational background (e.g., a teaching degree, a degree in the subject being taught, etc.), teaching experience (e.g., number of years teaching), and so forth.

The goal of such random assignment is to avoid the potential selection bias that can occur when the groups that are being compared are not similar before the research starts. Taking the example above, we may expect that students who not only received lectures, but also seminar classes, would perform better than those students who only received lectures. However, what if the lecturers who taught the group that only had lectures (and no seminar classes) were considerably more experience teachers, with a much stronger educational background than those lecturers that taught the group that had lectures and seminar classes. Whilst we may still expect the students who had both lectures and seminar classes to get higher exam marks than the students that only had lectures, but perhaps the difference in the exam marks are no longer significant. You would then no longer know if the difference in the exam marks (i.e., your dependent variable) was due to the differences in the teachers' ability (a source of bias) or the two different teaching methods (i.e., the independent variable). This creates a threat to internal validity.

Whilst we focused on the sampling bias of lecturers/teachers in our example, we could have similarly talked about the students that were assigned to the two groups (i.e., the group who only received lectures, versus the group who received lectures and seminars). After all, if students in one group consisted of a larger proportion of brighter students (or more conscientious students) than the other group, this could also explain the potential increase in exam performance between the groups, rather than the type of teaching method the groups received.

Selection bias is likely to be a more significant threat to internal validity when you are using a quasi-experimental research design. In comparison to the experimental research design, the quasi-experimental research design does not involve the random assignment of participants to the different groups being compared. As the article, Quasi-experimental research designs shows, such a quasi-experimental research design may have been chosen intentionally, or it may not have been possible to randomly assign participants. This may reflect the difficulty in meeting the requirements of a probability sample, such as obtaining a detailed list of the population being studied, which forces you to select a non-probability sample [see the section: Sampling Strategy]; or you may be studying a pre-existing group where it is impossible to separate participants into different groups (e.g., a class of students from one school and a class of students from another school).

To learn more about selecting an appropriate sampling strategy in your dissertation, which will help you to reduce the potential for sampling bias, see the section: Sampling Strategy.