Sampling: The Basics

Sampling is an important component of any piece of research because of the significant impact that it can have on the quality of your results/findings. If you are new to sampling, there are a number of key terms and basic principles that act as a foundation to the subject. This article explains these key terms and basic principles. Rather than a comprehensive look at sampling, the article presents the sampling basics that you would need to know if you were an undergraduate or master's level student about to perform a dissertation (or similar piece of research). It also provides links to other articles within the Sampling Strategy section of this website that you may find useful. Some of the key sampling terms you will come across include population, units, sample, sample size, sampling frame, sampling techniques and sampling bias. Each is discussed in turn:

Population
Units
Sample
Sample size
Sampling frame
Sampling bias
Sampling techniques

Population

The word population is different when used in research compared with the way we think about a population under normal circumstances. Typically, we refer to the population of a country (or region), such as the United States or Great Britain. However, in research (and the theory of sampling), the word population has a different meaning. In sampling, a population signifies the units that we are interested in studying. These units could be people, cases and pieces of data. Some examples of each of these types of population are present below:

People
Students enrolled at a university (e.g., Harvard University) or studying a particular course (e.g., Statistics 101)
United States Senators or Congressman who are Democrats
Users of Facebook or Twitter
Presidents and CEOs of Fortune 500 or FTSE 100 companies
Nurses working at hospitals in the State of Texas
Cases (i.e., organisations, institutions, countries, etc.)
Recruitment agencies in Greater London, England
Law firms in Manhattan, New York, United States
The World Trade Organisation (WTO)
The European Parliament
Countries that are members of NATO
Signatories of the Helsinki Accord
Pieces of data
Customer transactions at Wal-Mart or Tesco between two time points (e.g., 1st April 2009 and 31st March 2010)
The breaking distances (in kpm/m) of a particular model of car
University applications in the United States in 2011
Households with broadband subscriptions in the town of Carmarthen, Wales

When thinking about the population you are interested in studying, it is important to be precise. For example, if we say that our population is users of Facebook, this would imply that we were interested in all 500 million (or more) Facebook users, irrespective of what country they were in, whether they were male or female, what age they were, how often they used Facebook, and so forth. However, if the population you were interested in was more specific, you should make this clear. Perhaps our population is not Facebook users, but frequent, male Facebook users in the United States. When we come to describe our population further, we would also need to define what we meant by frequent users (e.g., people that log in to Facebook at least once a day).

Units

As discussed above, the population that you are interested consists of units, which can be people, cases or pieces of data. These terms can sometimes be used interchangeably. In this website, we use the word units whenever we are referring to those things that make up a population. However, since you may find other textbooks referring to these units as people, cases, or pieces of data, we have provided some further clarification below:

The population you are interested in consists of one or more units. For example, if the population we were interested in was all 500 million (or more) Facebook users, each of these Facebook users would be a unit. So we would have 500 million (or more) units in our population. If we were interested in CEOs (or Presidents) of Fortune 500 companies, the CEOs (or Presidents) would be our units.
Sometimes the word units is replaced with the word cases. As highlighted in the population examples above, sometimes the populations we are interested in are organisations, institutions and countries. In such cases, it is often more appropriate to refer to each of these (e.g., recruitment agencies, law firms) as cases. You may be interested in a population that consists of only one case (e.g., the World Trade Organisation or European Parliament) or maybe you are interested in a population that has many cases (e.g., recruitment agencies in London, of which there must be hundreds).
Finally, researchers sometimes refer to populations consisting of data (or pieces of data) instead of units or cases. For example, researchers may be interested in customer transactions at a particular supermarket (e.g., Wal-Mart or Tesco) between two time points (e.g., 1st April 2009 and 31st March 2010); perhaps because they want to examine the effect of certain promotions on sales figures.

Sample

When we are interested in a population, it is often impractical and sometimes undesirable to try and study the entire population. For example, if the population we were interested in was frequent, male Facebook users in the United States, this could be millions of users (i.e., millions of units). If we chose to study these Facebook users using structured interviews (i.e., our chosen research method), it could take a lifetime. Therefore, we choose to study just a sample of these Facebook users.

Whilst we discuss more about sampling and why we sample later in this article, the important point to remember here is that a sample consists of only those units (in this case, Facebook users) from our population of interest (i.e., X million frequent, male, Facebook users in the United States) that we actually study (e.g., 500 or 1000 of these Facebook users).

Sample Size

The sample size is simply the number of units in your sample. In the example above, the sample size selected may be just 500 or 1000 of the Facebook users that are part of our population of frequent, male, Facebook users in the United States.

In practice, the sample size that is selected for a study can have a significant impact on the quality of your results/findings, with sample sizes that are either too small or excessively large both potentially leading to incorrect findings. As a result, sample size calculations are sometimes performed to determine how large your sample size needs to be to avoid such problems. However, these calculations can be complex, and are typically not performed at the undergraduate and master's level when completing a dissertation.

Sampling frame

The sampling frame is very similar to the population you are studying, and may be exactly the same. When selecting units from the population to be included in your sample, it is sometimes desirable to get hold of a list of the population from which you select units. This is the case when using certain types of sampling technique (i.e., probability sampling techniques), which we discuss later in the article. This list can be referred to as the sampling frame. We explain more about sampling frames in the article: Probability sampling.

Sampling bias

Sampling bias occurs when the units that are selected from the population for inclusion in your sample are not characteristic of (i.e., do not reflect) the population. This can lead to your sample being unrepresentative of the population you are interested in.

For example, you want to measure how often residents in New York go to a Broadway show in a given year. Clearly, standing along Broadway and asking people as they pass by how often they went to Broadway shows in a given year would not make sense because a higher proportion of those passing by are likely to have just come out of a show. The sample would therefore be biased.

For this reason, we have to think carefully about the types of sampling techniques we use when selecting units to be included in our sample. Some sampling techniques, such as convenience sampling, a type of non-probability sampling (which reflected the Broadway example above), are prone to greater bias than probability sampling techniques. We discuss sampling techniques further next.

Sampling techniques

As we have mentioned above, when we are interested in a population, we typically study a sample of that population rather than attempt to study the whole population (e.g., just 500 of the X million frequent, male Facebook users in the United States). If we imagine that our desired sample size was just 500 of these Facebook users, the question arises: How do we know what Facebook users to invite to take part in our sample? In other words, what Facebook users will become part of our sample?

The purpose of sampling techniques is to help you select units (e.g., Facebook users) to be included in your sample (e.g., of 500 Facebook users). Broadly speaking, there are two groups of sampling technique: probability sampling techniques and non-probability sampling techniques.

Probability sampling techniques
Probability sampling techniques use random selection (i.e., probabilistic methods) to help you select units from your sampling frame (i.e., similar or exactly that same as your population) to be included in your sample. These procedures (i.e., probabilistic methods) are very clearly defined, making it easy to follow them. Since the characteristics of the sample researchers are interested in vary, different types of probability sampling technique exist to help you select the appropriate units to be included in your sample. These types of probability sampling technique include simple random sampling, systematic random sampling, stratified random sampling and cluster sampling.
We discuss probability sampling in more detail the article, Probability sampling. We also discuss each of these different types of probability sampling technique, how to carry them out, and their advantages and disadvantages [see the articles: Simple random sampling, Systematic random sampling and Stratified random sampling].
Non-probability sampling techniques
Non-probability sampling techniques refer on the subjective judgement of the researcher when selecting units from the population to be included in the sample. For some of the different types of non-probability sampling technique, the procedures for selecting units to be included in the sample are very clearly defined, just like probability sampling techniques. However, in others (e.g., purposive sampling), the subjective judgement required to select units from the population, which involves a combination of theory, experience and insight from the research process, makes selecting units more complicated. Overall, the types of non-probability sampling technique you are likely to come across include quota sampling, purposive sampling, convenience sampling, snowball sampling and self-section sampling.
We discuss non-probability sampling in more detail in the article, Non-probability sampling. We also discuss each of these different types of non-probability sampling technique, how to carry them out, and their advantages and disadvantages [see the articles: Quota sampling, Purposive sampling, Convenience sampling, Snowball sampling and Self-selection sampling].

If you want to know more about the sampling techniques you may use in your dissertation, read up on probability sampling and non-probability sampling.