Threats to reliability

Threats to reliability are those factors that cause (or are sources of) error. After all, the instability or inconsistency in the measurement you are using comes from such error. Some of the sources of error in your dissertation may include: researcher (or observer) error, environmental changes and participant changes.

Researcher (or observer) error

There are many situations during the dissertation process where you are responsible for taking measurements. During this measurement process, as the researcher, you can introduce error when carrying our measurements. This is known as researcher (or observer) error. Even when a measurement process is considered to be precise (e.g., a stopwatch), your judgement will often be involved in the use of the measurement (e.g., when to start and stop the stopwatch). Human error (or human differences) is also a factor (e.g., the reaction time to start the watch). This becomes a greater problem as the number of researchers (observers) increases and/or the number of measurements increases (e.g., 10 people using stopwatches, making 100 time measurements).

Environmental changes

During the time between measurements (e.g., recording time on a stopwatch), there may be small environmental changes that influence the measurements being taken, creating error. These changes in the environment make it impossible to ensure that the same individual is measured in the same way (i.e., under identical conditions). For example, even two closely timed measurements may be affected by environmental conditions/variables (e.g., light, day, time, temperature, etc.). However, it should be noted that ensuring that individuals are measured in the same way each time (i.e., with the same/identical environmental conditions), without any environmental change, is an ideal.

Participant changes

Between measurements, it is also possible for research participants to change in some way. Whilst this potential for change is generally reduced if the time between measurements is short, this is not necessarily the case. It depends on the nature of the measurement (e.g., focus/attention affects reaction times, hunger/tiredness leads to reduced physical/mental performance, etc.). These participant changes can create error that reduces the reliability (i.e., consistency or stability) of measurements.

Types and methods/measures of reliability

The type of reliability that you should apply in your dissertation will vary depending on the research methods you select. In the sections below, we look at (a) successive measurements, (b) simultaneous measurements by more than one researcher, and (c) a single measurement point.

Successive measurements

It is common in quantitative research for successive measurements to be taken. After all, in experimental research and quasi-experimental research, researchers often conduct a pre-test, followed by a post-test [see the articles: Experimental research designs and Quasi-experimental research designs]. In such cases, we want to make sure that the measurement procedures that are used (e.g., a questionnaire, survey) produce measurements that are reliable, both for the pre-test and the post-test. Sometimes the measurement procedures are the same for the pre-test and the post-test, whilst on other occasions a different measurement procedure is used in the post-test. In both cases, we need to make sure that the measurement procedures that are used are reliable. However, we use different tests of reliability to achieve this: (a) test-retest reliability on separate days; and (b) parallel-forms reliability. Each of these tests of reliability is discussed in turn:

Simultaneous measurements by more than one researcher

In quantitative research, sometimes more than one researcher is required when collecting measurements, which makes it important to assess the reliability of the simultaneous measurements that are taken. There are two common reasons for this: (a) experimenter bias and instrumental bias; and (b) experimental demands. Let's look at each in turn:

Since the judgement of researchers is not perfect, we cannot assume that different researchers will record a measurement of something in the same way (e.g., measure the social awkwardness of a person on a scale of 1-10 simply by observing them). In order to assess how reliable such simultaneous measurements are, we can use inter-rater reliability. Such inter-rater reliability is a measure of the correlation between the scores provided by the two observers, which indicates the extent of the agreement between them (i.e., reliability as equivalence). To learn more about inter-rater reliability, how to calculate it using the statistics software SPSS, interpret the findings and write them up, see the Data Analysis section of Lærd Dissertation.

1 2 3