Reliability in research | Lærd Dissertation

Single measurement point

Unlike the test-retest reliability, parallel-forms reliability and inter-rater reliability, testing for internal consistency only requires the measurement procedure to be completed once (i.e., during the course of the experiment, without the need for a pre- and post-test). This may reflect post-test only designs in experimental and quasi-experimental research, as well as single tests in non-experimental research (e.g., relationship-based research) that have no intervention/treatment [see the articles, Experimental research designs, Quasi-experimental research designs and Relationship-based research designs, if you are unsure about the differences between these different types of quantitative research design].

When faced with such a scenario (i.e., where the measurement procedure is only completed once), we examine the reliability of the measurement procedure that has been created in terms of its internal consistency; that is, the internal consistency of the different items that make up the measurement instrument. Reliability as internal consistency can be determined using a number of methods. We look at the split-half method and Cronbach's alpha:

Split-half reliability
Split-half reliability is mainly used for written/standardized tests, but it is sometimes used in physical/human performance tests (albeit ones that require a number of trials). However, it is based on the assumption that the measurement procedure can be divided (i.e., split) into two matched halves.
Split-half reliability is assessed by splitting the measures/items from the measurement procedure in half, and then calculating the scores for each half separately. Before calculating the split-half reliability of the scores, you have to decide how to split the measures/items from the measurement procedure (e.g., a written/standardized test). How you do this will affect the values you obtain.
- One option is to simply to divide the measurement procedure in half; that is, take the scores from the measures/items in the first half of the measurement procedure and compare them to the scores from those measures/items in the second half of the measurement procedure. This can be problematic because of (a) issues of test design (e.g., easier/harder questions are in the first/second half of the measurement procedure), (b) participant fatigue/concentration/focus (i.e., scores may decrease during the second half of the measurement procedure), and (c) different items/types of content in different parts of the test.
- Another option is to compare odd- and even-numbered items/measures from the measurement procedure. The aim of this method is to try and match the measures/items that are being compared in terms of content, test design (i.e., difficulty), participant demands, and so forth. This helps to avoid some of the potential biases that arise from simply dividing the measurement procedure in two.
After dividing the measures/items from the measurement procedure, the scores from each of the halves is calculated separately, before the internal consistency between the two sets of scores is assessed, usually through a correlation (e.g., using the Spearman-Brown formula). The measurement procedure is considered to demonstrate split-half reliability if the two sets of scores are highly correlated (i.e., there is a strong relationship between the scores).
Cronbach's alpha
Cronbach's alpha coefficient (also known as the coefficient alpha technique or alpha coefficient of reliability) is a test of reliability as internal consistency (Cronbach, 1951). At the undergraduate and master's dissertation level, it is more likely to be used than the split-half method. It is most likely to be used in written/standardized tests (e.g., a survey).
Cronbach's alpha is also used to measure split-half reliability. However, rather than simply examining two sets of scores; that is, computing the split-half reliability on the measurement procedure only once, Cronbach's alpha does this for each measure/item within a measurement procedure (e.g., every question within a survey). Therefore, Cronbach's alpha examines the scores between each measure/item and the sum of all the other relevant measures/items you are interested in. This provides us with a coefficient of inter-item correlations, where a strong relationship between the measures/items within the measurement procedure suggests high internal consistency (e.g., a Cronbach's alpha coefficient of .80).
Cronbach's alpha is often used when you have multi-items scales (e.g., a measurement procedure, such as a survey, with multiple questions). It is also a versatile test of reliability as internal consistency because it can be used for attitudinal measurements, which are popular amongst undergraduate and master's level students (e.g., attitudinal measurements include Likert scales with options such as strongly agree, agree, neither agree nor disagree, disagree, strongly disagree). However, Cronbach's alpha does not determine the unidimensionality of a measurement procedure (i.e., that a measurement procedure only measures one construct, such as depression, rather than being able to distinguish between multiple constructs that are being measured within a measurement procedure; perhaps depression and employee burnout). This is because you could get a high Cronbach's alpha coefficient (e.g., .80) when testing a measurement procedure that involves two or more constructs.

Bibliography

Bartholomew, D. J. (2002). Measuring intelligence: Facts and fallacies. Cambridge: Cambridge University Press.

Cronbach, L. J. (1947). Test "reliability": Its meaning and determination. Psychometrika, 12(1): 1-16.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3): 297-334.

Kuder, G F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3): 151-160.

Miller, M. B. (1995). Coefficient alpha: A basic introduction from the perspectives of classical test theory and structural equation modelling. Structural Equation Modeling, 2(3): 255-273.

Ratcliff, R. (1993). Methods for dealing with reaction time outliers. Psychological Bulletin, 114(3): 510-532.

Salthouse, T. A., & Hedden, T. (2002). Interpreting reaction time measures in between-group comparisons. Journal of Clinical and Experimental Neuropsychology, 24(7): 858-872.

Schuerger, J. M., & Witt, A. C. (1989). The temporal stability of individually tested intelligence. Journal of Clinical Psychology, 45(2): 294-302.

Yellott, Jr., J. I. (1971). Correction for fast guessing and the speed-accuracy tradeoff in choice reaction time. Journal of Mathematics Psychological, 8(2): 159-199.

Single measurement point

How do I use these tests of reliability?

Bibliography