Stratified random sampling is a type of probability sampling technique [see our article Probability sampling if you do not know what probability sampling is]. Unlike the simple random sample and the systematic random sample, sometimes we are interested in particular strata (meaning groups) within the population (e.g., males vs. females; houses vs. apartments, etc.) [see our article, Sampling: The basics, if you are unsure about the terms unit, sample, strata and population]. With the stratified random sample, there is an equal chance (probability) of selecting each unit from within a particular stratum (group) of the population when creating the sample. This article explains (a) what stratified random sampling is, (b) how to create a stratified random sample, and (c) the advantages and disadvantages (limitations) of stratified random sampling.
Imagine that a researcher wants to understand more about the career goals of students at the University of Bath. Let's say that the university has roughly 10,000 students. These 10,000 students are our population (N). In order to select a sample (n) of students from this population of 10,000 students, we could choose to use a simple random sample or a systematic random sample. However, sometimes we are interested in particular strata (groups) within the population. Therefore, the stratified random sample involves dividing the population into two or more strata (groups). These strata are expressed as H.
For example, imagine we were interested in comparing the differences in career goals between male and female students at the University of Bath. If this was the case, we would want to ensure that the sample we selected had a proportional number of male and female students. This is known as proportionate stratification (as opposed to disproportionate stratification, where the sample size of each of the stratum is not proportionate to the population size of the same stratum). With stratified random sampling, there would an equal chance (probability) that each female or male student could be selected for inclusion in each stratum of our sample. However, in line with proportionate stratification, the total number of male and female students included in our sampling frame would only be equal if 5,000 students from the university were male and the other 5,000 students were female. Since this is unlikely to be the case, the number of units that should be selected for each stratum (i.e., the number of male and female students selected) will vary. We explain how this is achieved in the next section: Creating a stratified random sample.
To create a stratified random sample, there are seven steps: (a) defining the population; (b) choosing the relevant stratification; (c) listing the population; (d) listing the population according to the chosen stratification; (e) choosing your sample size; (f) calculating a proportionate stratification; and (g) using a simple random or systematic sample to select your sample.
In our example, the population is the 10,000 students at the University of Bath. The population is expressed as N. Since we are interested in all of these university students, we can say that our sampling frame is all 10,000 students. If we were only interested in female university students, for example, we would exclude all males in creating our sampling frame, which would be much less than 10,000.
If we wanted to look at the differences in male and female students, this would mean choosing gender as the stratification, but it could similarly involve choosing students from different subjects (e.g., social sciences, medicine, engineering, education, etc.), year groups, or some other variable(s). For the purposes of this example, we will use gender (male/female) as our strata.
We need to identify all 10,000 students at the University of Bath. If you were actually carrying out this research, you would most likely have had to receive permission from Student Records (or another department in the university) to view a list of all students studying at the university. You can read about this later in the article under Disadvantages (limitations) of stratified random sampling.
As with the simple random sampling and systematic random sampling techniques, we need to assign a consecutive number from 1 to NK to each of the students in each stratum. As a result, we would end up with two lists, one detailing all male students and one detailing all female students.
Let's imagine that we choose a sample size of 100 students. The sample is expressed as n. This number was chosen because it reflects the limit of our budget and the time we have to distribute our questionnaire to students. However, we could have also determined the sample size we needed using a sample size calculation, which is a particularly useful statistical tool. This may have suggested that we needed a larger sample size; perhaps as many as 400 students.
Imagine that of the 10,000 students, 60% of these are female and 40% male. We need to ensure that the number of units selected for the sample from each stratum is proportionate to the number of males and females in the population. To achieve this, we first multiply the desired sample size (n) by the proportion of units in each stratum. Therefore, to calculate the number of female students required in our sample, we multiply 100 by 0.60 (i.e., 0.60 = 60% of the population of students at the university), which gives us a total of 60 female students. If we do the same for male students, we get 40 students (i.e., 40% of students are male, where 100 x 0.40 = 40). This means that we need to select 60 female students and 40 male students for our sample of 100 students.
Now that we have chosen to sample 40 male and 60 female students, we still need to select these students from our two lists of male and female students (see STEP FOUR above). We do this using either simple random sampling or systematic random sampling [click on the links to see what to do next].
The advantages and disadvantages (limitations) of stratified random sampling are explained below. Many of these are similar to other types of probability sampling technique, but with some exceptions. Whilst stratified random sampling is one of the 'gold standards' of sampling techniques, it presents many challenges for students conducting dissertation research at the undergraduate and master's level.
Advantages of stratified random sampling
The aim of the stratified random sample is to reduce the potential for human bias in the selection of cases to be included in the sample. As a result, the stratified random sample provides us with a sample that is highly representative of the population being studied, assuming that there is limited missing data.
Since the units selected for inclusion within the sample are chosen using probabilistic methods, stratified random sampling allows us to make statistical conclusions from the data collected that will be considered to be valid.
Relative to the simple random sample, the selection of units using a stratified procedure can be viewed as superior because it improves the potential for the units to be more evenly spread over the population. Furthermore, where the samples are the same size, a stratified random sample can provide greater precision than a simple random sample. Because of the greater precision of a stratified random sample compared with a simple random sample, it may be possible to use a smaller sample, which saves time and money.
The stratified random sample also improves the representation of particular strata (groups) within the population, as well as ensuring that these strata are not over-represented. Together, this helps the researcher to compare strata, as well as make more valid inferences from the sample to the population.
Disadvantages (limitations) of stratified random sampling
A stratified random sample can only be carried out if a complete list of the population is available.
It must also be possible for the list of the population to be clearly delineated into each stratum; that is, each unit from the population must only belong to one stratum. In our example, this would be fairly simple, since our strata are male and female students. Clearly, a student could only be classified as either male or female. No student could fit into both categories (ignoring transgender issues).
Furthermore, imagine extending the sampling requirements such that we were also interested in how career goals changed depending on whether a student was an undergraduate or graduate. Since the strata must be mutually exclusive and collectively exclusive, this means that we would need to sample four strata from the population: undergraduate males, undergraduate females, graduate males, and graduate females. This will increase overall sample size required for the research, which can increase costs and time to carry out the research.
Attaining a complete list of the population can be difficult for a number of reasons:
Even if a list is readily available, it may be challenging to gain access to that list. The list may be protected by privacy policies or require a length process to attain permissions.
There may be no single list detailing the population you are interested in. As a result, it may be difficult and time consuming to bring together numerous sub-lists to create a final list from which you want to select your sample. As an undergraduate and master's level dissertation student, you may simply not have sufficient time to do this. Indeed, it will be more complex and time consuming to prepare this list compared with simple random sampling and systematic random sampling.
Many lists will not be in the public domain and their purchase may be expensive; at least in terms of the research funds of a typical undergraduate or master's level dissertation student.
In terms of human populations (as opposed to other types of populations; see the article: Sampling: The basics), some of these populations will be expensive and time consuming to contact, even where a list is available. Assuming that your list has all the contact details of potential participants in the first instance, managing the different ways (postal, telephone, email) that may be required to contact your sample may be challenging, not forgetting the fact that your sample may also be geographical scattered.
In the case of human populations, to avoid potential bias in your sample, you will also need to try and ensure that an adequate proportion of your sample takes part in the research. This may require re-contacting non-respondents, can be very time consuming, or reaching out to new respondents.