Probability sampling represents a **group** of **sampling techniques** that help researchers to select **units** from a **population** that they are interested in studying. Collectively, these **units** form the **sample** that the researcher studies [see our article, Sampling: The basics, to learn more about terms such as **unit**, **sample** and **population**]. A core characteristic of probability sampling techniques is that units are selected from the population at **random** using **probabilistic methods**. This enables researchers to make **statistical inferences** (i.e., **generalisations**) from the sample being studied to the population of interest. This article discusses the **principles** of probability sampling and briefly sets out the **types** of probability sampling technique discussed in detail in others articles within this site. The article is divided into two sections: **principles of probability sampling** and **types of probability sampling**:

There are a number of **theoretical** and **practical** reasons for using probability sampling: **(a)** making statistical inferences; **(b)** achieving a representative sample; **(c)** minimising sampling bias; **(d)** selecting units using probabilistic methods; and **(e)** meeting the criteria for probability sampling. Each of these **basic principles** of probability sampling is discussed in turn. However, to provide some context for these basic principles, we will use the following example.

Career choices of all students at the University of Oxford, England

Imagine we were interested in examining **the career choices of all students at the University of Oxford, England**. The students would be our **units**; all of those students studying at the University of Oxford in England would be our **population**; and the career choices of these students would reflect the **phenomenon** that we are interested in studying. If we choose to **sample** 250 of these students, our **sample size** would be 250 units.

Now let's look at each of these basic principles of probability sampling:

When we study a **sample** of a **population**, the immediate task is often to **analyse the data** that we have collected from that **sample**. In the case of the 250 students that make up our sample at the University of Oxford, we would analyse the data that we had collected about their career choices. For example, we may have asked the following **closed question**:

Question

What factors influence your choice of career?

Let's imagine we provided the student with the following **options**:

Options [tick all that apply]

Career prospects

Nature of the work

Physical working conditions

Salary and benefits

Other

If *Other*, please state what this is...

When we analysed all the responses from the 250 students in our sample, let's imagine that **career prospects** was by far the most important factor influencing the students' career choices, with the **remaining options** (e.g., salary and benefits, physical working conditions) not being at all important.

This tells us something interesting about the **sample**, but we actually want to know about the **population**, not just the sample. We want to know if our **findings** from the **sample** can be **generalised** to the **population**. In other words, we want to make **generalisations** about the **population** from our **sample**. This is particularly important if we were following a **quantitative research design**.

However, when we use **probability sampling** to **select units** from the population to include in our sample, with the aim of making generalisations from the sample to the population, we use the more precise term, **statistical inferences**, instead of **generalisations**. This reflects the fact that probability sampling uses **statistical techniques** to make such generalisations; techniques that also allow us to know **how accurately** our sample **represents** the population of interest. In other words, when we state that **career prospects** was the most important factor influencing the career choices of **all students** at the University of Oxford (i.e., our population), based on our sample of 250 students at the university, we want to know **how confident** we are that this is the case. Probability sampling helps us to make such **statistical inferences** and assess **how confident** we are about such inferences.

A critical component of probability sampling is the need to create a sample that is **representative** of the population. The **more representative** the sample is of the population, the **more confident** we can be when making **statistical inferences** (i.e., **generalisations**) from the sample to the population of interest.

If **all units** within the population were **identical in all respects** there would be no need to sample at all. Under this scenario of **perfect homogeneity of units**, we could simply study a **single unit** since this would reflect the population perfectly.

For example:

If all students at the University of Oxford were of the same age and gender, had the same levels of ambition, the same attitudes and desires, and so forth, we could simply ask one student **what factors influenced his/her career choice**. The answer to this could then be generalised to the whole population; so if the **single student** stated that **salary and benefits** was the most important factor, we would be able to say that the same was true for **all students** at the university.

However, populations **are not homogenous** (i.e., not the same), especially populations that consist of people. Therefore, we need to make sure when sampling that we create a sample that is representative of the population we are interested in; that is, a sample that has the **same variations** that we would see in the population. When a sample is not representative of the population, this can lead to **bias**.

In sampling, **bias** means that the units selected from the population for inclusion in your sample are **not representative** of that population. When units are selected from a population for inclusion in a sample, it is inevitable that there will be some sampling bias. However, one of the reasons to use **probability sampling** is that it is particularly effective at helping to **minimise** such sampling bias compared with **non-probability sampling**.

Sampling bias can occur for a number of reasons, but biases are often practical in nature. Let's look at two types of bias:

Sampling frames and populations

The

**sampling frame**is very similar to the**population**you are studying, and may be**exactly the same**. When selecting units from the population to be included in your sample, probability sampling requires that you obtain a**list of the population**from which you select units. This**list**is the**sampling frame**from which you select units.For example:

In the case of the students at the University of Oxford, this would mean that we need to obtain a**list**of**all students**studying at the university. Such a list would most likely be held by the department known as Student Records. The list would likely contains details about each student (e.g., name, age, gender, degree programme, email address, etc.).Sampling bias occurs when the sampling frame and the population are

**not consonant**(i.e., not the same). This could happen if we were unable to obtain permission to get access to the list of the population we are interested in, which is a common occurrence.For example:

It is quite likely that Student Records would be unwilling to provide access to a list of all students at the University of Oxford for reasons of confidentiality and privacy. However, even if Student Records were prepared to provide access to the list under certain conditions, the list may not be 100% up-to-date. For example, there may have been a new intake of students and the list may not yet have been updated.Whilst such

**differences**between the sampling frame and the population we are interested in may be small, they still lead to sampling bias. Such sampling bias can result in researchers making**over-generalisations**because a particular**characteristic**and/or**group**(i.e.,**strata**) from the population is**under**or**over-represented**in the sample.Conscious and unconscious human choices

Researchers have

**conscious**and**unconscious biases**that can affect how they select units from the population for inclusion in their sample. For example, if approaching people in the high street, researchers may consciously choose to approach people that they feel are more like themselves. This may even be an unconscious action. They may also choose to select units based on ease or lower cost. These are some of the typical**sampling biases**of**non-probability sampling techniques**such as**convenience sampling**.Probability sampling is particular effective (compared with non-probability sampling) at

**reducing**this type of sampling bias. It achieves this through the use of**probabilistic methods**in the selection of units from the population for inclusion in the sample. We discuss this next.

A cornerstone of probability sampling is the use of **random selection** (i.e., **probabilistic methods**) to help you **select units** from your **sampling frame** to be included in your **sample**. The purpose of random selection is the creation of a sample whose units are representative of (i.e., have very similar characteristics to) the population they represent. With random selection, each unit has an equal chance (i.e., equal probability) of being selected. The use of random selection not only improves the chance of creating a representative sample, but also provides you with methods to estimate how likely (i.e., probable) this will be.

With probability sampling, units can be randomly selected with the aid of **random number tables** or a **random number generator**. However, the **procedure** to select units from the sampling frame differs depending on the **type** of probability sampling technique that is used. Nonetheless, these procedures are very clearly defined, making it easy to follow them. We briefly discuss these different types of sampling technique later in this article [see the section: Types of probability sampling].

Probability sampling may be considered the ideal for research guided by a **positivist** or **post-positivist research paradigm** and a **quantitative research design**, as well as **quantitative research methods** [see the sections, **Research Paradigms**, Research Designs and Research Methods if you are unfamiliar with terms such as **research paradigms**, **quantitative research designs** and **quantitative research methods**]. However, for students doing a dissertation at the undergraduate or master's level, it can often be very difficult to get access to and/or find a **list of the population** you are interested in (i.e., a list of all units that make up your **sampling frame**). Nonetheless, for probability sampling to be possible, you must be able to find and/or put together such a **list** and get access to it. This is perhaps the most challenging criteria affecting the use of probability sampling, which often leads student researchers to use **non-probability sampling techniques** instead [see the article: Non-probability sampling].

We can say that the basic principle of probability sampling is to ensure that the sample being studied is representative of the population of interest. This helps to minimise potential sampling bias that would reduce your ability to make generalisations (i.e., statistical inferences) from the sample to the population. To minimise sampling bias, probabilistic methods are used so that units from the population are selected at random; the objective is that each unit has an equal chance of being selected. To achieve this, a list of the population must be attainable and this sampling frame must be the same as (or similar to) the population being studied. Different probability sampling techniques are subsequently used to select units from the population to create your sample, depending on the context of the population you are studying.

If you are considering whether to use probability sampling, it is important to consider how your choice of research strategy will influence whether this is an appropriate decision. Even if you know that probability sampling fits with the research strategy guiding your dissertation, it is important to choose the appropriate type of probability sampling technique. These probability sampling techniques are briefly set out in the next section.

There are three **types** of probability sampling technique that you may use when doing a dissertation at the undergraduate and master's level: **simple random sampling**, **systematic random sampling** and **stratified random sampling**.

To get a sense of what these three types of probability sampling technique are, imagine that a researcher wants to understand more about the career goals of students at a single university. Let's say that the university has roughly 10,000 students. These 10,000 students are our population (** N**). Each of the 10,000 students is known as a unit (although sometimes other terms are used to describe a unit; see Sampling: The basics). In order to select a sample (

Simple random sampling

With simple random sampling, there is an equal chance (probability) that each of the 10,000 students could be selected for inclusion in our sample. If our desired sample size was around 200 students, were would select 200 students at random, probably using random number tables. To understand more about simple random sampling, how to create a simple random sample, and the advantages and disadvantages of this probability sampling technique, see the article: Simple random sampling.

Systematic random sampling

Systematic random sample is a variation on the simple random sample. Like simple random sampling, there is an equal chance (probability) that each of the 10,000 students could be selected for inclusion in our sample. Whilst you typically use random number tables to select the first unit for inclusion in your sample, the remaining units are selected in an ordered way (e.g., every 9th student). To understand more about systematic random sampling, how to create a systematic random sample, and the advantages and disadvantages of this probability sampling technique, see the article: Systematic random sampling.

Stratified random sampling

Unlike the simple random sample and the systematic random sample, sometimes we are interested in particular strata (meaning groups) within the population (e.g., males vs. females; houses vs. apartments, etc.). With the stratified random sample, there is an equal chance (probability) of selecting each unit from within a particular stratum (group) of the population when creating the sample. To understand more about stratified random sampling, how to create a stratified random sample, and the advantages and disadvantages of this probability sampling technique, see the article: Stratified random sampling.