## Probability sampling

Probability sampling represents a group of sampling techniques that help researchers to select units from a population that they are interested in studying. Collectively, these units form the sample that the researcher studies [see our article, Sampling: The basics, to learn more about terms such as unit, sample and population]. A core characteristic of probability sampling techniques is that units are selected from the population at random using probabilistic methods. This enables researchers to make statistical inferences (i.e., generalisations) from the sample being studied to the population of interest. This article discusses the principles of probability sampling and briefly sets out the types of probability sampling technique discussed in detail in others articles within this site. The article is divided into two sections: principles of probability sampling and types of probability sampling:

### Principles of probability sampling

There are a number of theoretical and practical reasons for using probability sampling: (a) making statistical inferences; (b) achieving a representative sample; (c) minimising sampling bias; (d) selecting units using probabilistic methods; and (e) meeting the criteria for probability sampling. Each of these basic principles of probability sampling is discussed in turn. However, to provide some context for these basic principles, we will use the following example.

Career choices of all students at the University of Oxford, England
Imagine we were interested in examining the career choices of all students at the University of Oxford, England. The students would be our units; all of those students studying at the University of Oxford in England would be our population; and the career choices of these students would reflect the phenomenon that we are interested in studying. If we choose to sample 250 of these students, our sample size would be 250 units.

Now let's look at each of these basic principles of probability sampling:

#### Making statistical inferences

When we study a sample of a population, the immediate task is often to analyse the data that we have collected from that sample. In the case of the 250 students that make up our sample at the University of Oxford, we would analyse the data that we had collected about their career choices. For example, we may have asked the following closed question:

Question
What factors influence your choice of career?

Let's imagine we provided the student with the following options:

Options [tick all that apply]
Career prospects
Nature of the work
Physical working conditions
Salary and benefits
Other
If Other, please state what this is...

When we analysed all the responses from the 250 students in our sample, let's imagine that career prospects was by far the most important factor influencing the students' career choices, with the remaining options (e.g., salary and benefits, physical working conditions) not being at all important.

This tells us something interesting about the sample, but we actually want to know about the population, not just the sample. We want to know if our findings from the sample can be generalised to the population. In other words, we want to make generalisations about the population from our sample. This is particularly important if we were following a quantitative research design.

However, when we use probability sampling to select units from the population to include in our sample, with the aim of making generalisations from the sample to the population, we use the more precise term, statistical inferences, instead of generalisations. This reflects the fact that probability sampling uses statistical techniques to make such generalisations; techniques that also allow us to know how accurately our sample represents the population of interest. In other words, when we state that career prospects was the most important factor influencing the career choices of all students at the University of Oxford (i.e., our population), based on our sample of 250 students at the university, we want to know how confident we are that this is the case. Probability sampling helps us to make such statistical inferences and assess how confident we are about such inferences.

#### Achieving a representative sample

A critical component of probability sampling is the need to create a sample that is representative of the population. The more representative the sample is of the population, the more confident we can be when making statistical inferences (i.e., generalisations) from the sample to the population of interest.

If all units within the population were identical in all respects there would be no need to sample at all. Under this scenario of perfect homogeneity of units, we could simply study a single unit since this would reflect the population perfectly.

For example:
If all students at the University of Oxford were of the same age and gender, had the same levels of ambition, the same attitudes and desires, and so forth, we could simply ask one student what factors influenced his/her career choice. The answer to this could then be generalised to the whole population; so if the single student stated that salary and benefits was the most important factor, we would be able to say that the same was true for all students at the university.

However, populations are not homogenous (i.e., not the same), especially populations that consist of people. Therefore, we need to make sure when sampling that we create a sample that is representative of the population we are interested in; that is, a sample that has the same variations that we would see in the population. When a sample is not representative of the population, this can lead to bias.

#### Minimising sampling bias

In sampling, bias means that the units selected from the population for inclusion in your sample are not representative of that population. When units are selected from a population for inclusion in a sample, it is inevitable that there will be some sampling bias. However, one of the reasons to use probability sampling is that it is particularly effective at helping to minimise such sampling bias compared with non-probability sampling.

Sampling bias can occur for a number of reasons, but biases are often practical in nature. Let's look at two types of bias:

1. Sampling frames and populations

The sampling frame is very similar to the population you are studying, and may be exactly the same. When selecting units from the population to be included in your sample, probability sampling requires that you obtain a list of the population from which you select units. This list is the sampling frame from which you select units.

For example:
In the case of the students at the University of Oxford, this would mean that we need to obtain a list of all students studying at the university. Such a list would most likely be held by the department known as Student Records. The list would likely contains details about each student (e.g., name, age, gender, degree programme, email address, etc.).

Sampling bias occurs when the sampling frame and the population are not consonant (i.e., not the same). This could happen if we were unable to obtain permission to get access to the list of the population we are interested in, which is a common occurrence.

For example:
It is quite likely that Student Records would be unwilling to provide access to a list of all students at the University of Oxford for reasons of confidentiality and privacy. However, even if Student Records were prepared to provide access to the list under certain conditions, the list may not be 100% up-to-date. For example, there may have been a new intake of students and the list may not yet have been updated.

Whilst such differences between the sampling frame and the population we are interested in may be small, they still lead to sampling bias. Such sampling bias can result in researchers making over-generalisations because a particular characteristic and/or group (i.e., strata) from the population is under or over-represented in the sample.

2. Conscious and unconscious human choices

Researchers have conscious and unconscious biases that can affect how they select units from the population for inclusion in their sample. For example, if approaching people in the high street, researchers may consciously choose to approach people that they feel are more like themselves. This may even be an unconscious action. They may also choose to select units based on ease or lower cost. These are some of the typical sampling biases of non-probability sampling techniques such as convenience sampling.

Probability sampling is particular effective (compared with non-probability sampling) at reducing this type of sampling bias. It achieves this through the use of probabilistic methods in the selection of units from the population for inclusion in the sample. We discuss this next.

#### Selecting units using probabilistic methods

A cornerstone of probability sampling is the use of random selection (i.e., probabilistic methods) to help you select units from your sampling frame to be included in your sample. The purpose of random selection is the creation of a sample whose units are representative of (i.e., have very similar characteristics to) the population they represent. With random selection, each unit has an equal chance (i.e., equal probability) of being selected. The use of random selection not only improves the chance of creating a representative sample, but also provides you with methods to estimate how likely (i.e., probable) this will be.

With probability sampling, units can be randomly selected with the aid of random number tables or a random number generator. However, the procedure to select units from the sampling frame differs depending on the type of probability sampling technique that is used. Nonetheless, these procedures are very clearly defined, making it easy to follow them. We briefly discuss these different types of sampling technique later in this article [see the section: Types of probability sampling].

#### In sum...

We can say that the basic principle of probability sampling is to ensure that the sample being studied is representative of the population of interest. This helps to minimise potential sampling bias that would reduce your ability to make generalisations (i.e., statistical inferences) from the sample to the population. To minimise sampling bias, probabilistic methods are used so that units from the population are selected at random; the objective is that each unit has an equal chance of being selected. To achieve this, a list of the population must be attainable and this sampling frame must be the same as (or similar to) the population being studied. Different probability sampling techniques are subsequently used to select units from the population to create your sample, depending on the context of the population you are studying.

If you are considering whether to use probability sampling, it is important to consider how your choice of research strategy will influence whether this is an appropriate decision. Even if you know that probability sampling fits with the research strategy guiding your dissertation, it is important to choose the appropriate type of probability sampling technique. These probability sampling techniques are briefly set out in the next section.

### Types of probability sampling

There are three types of probability sampling technique that you may use when doing a dissertation at the undergraduate and master's level: simple random sampling, systematic random sampling and stratified random sampling.

To get a sense of what these three types of probability sampling technique are, imagine that a researcher wants to understand more about the career goals of students at a single university. Let's say that the university has roughly 10,000 students. These 10,000 students are our population (N). Each of the 10,000 students is known as a unit (although sometimes other terms are used to describe a unit; see Sampling: The basics). In order to select a sample (n) of students from this population of 10,000 students, we could choose to use simple random sampling, systematic random sampling and stratified random sampling:

• Simple random sampling

With simple random sampling, there is an equal chance (probability) that each of the 10,000 students could be selected for inclusion in our sample. If our desired sample size was around 200 students, were would select 200 students at random, probably using random number tables. To understand more about simple random sampling, how to create a simple random sample, and the advantages and disadvantages of this probability sampling technique, see the article: Simple random sampling.

• Systematic random sampling

Systematic random sample is a variation on the simple random sample. Like simple random sampling, there is an equal chance (probability) that each of the 10,000 students could be selected for inclusion in our sample. Whilst you typically use random number tables to select the first unit for inclusion in your sample, the remaining units are selected in an ordered way (e.g., every 9th student). To understand more about systematic random sampling, how to create a systematic random sample, and the advantages and disadvantages of this probability sampling technique, see the article: Systematic random sampling.

• Stratified random sampling

Unlike the simple random sample and the systematic random sample, sometimes we are interested in particular strata (meaning groups) within the population (e.g., males vs. females; houses vs. apartments, etc.). With the stratified random sample, there is an equal chance (probability) of selecting each unit from within a particular stratum (group) of the population when creating the sample. To understand more about stratified random sampling, how to create a stratified random sample, and the advantages and disadvantages of this probability sampling technique, see the article: Stratified random sampling.