Sampling: The basics

Sampling is an important component of any piece of research because of the significant impact that it can have on the quality of your results (or findings). If you are new to sampling, there are a number of key terms and basic principles that act as a foundation to the subject. This article explains these key terms and basic principles. Rather than a comprehensive look at sampling, the article presents the sampling basics that you would need to know if you were an undergraduate or master’s level student about to perform a dissertation (or similar piece of research). It also provides links to other articles within the Sampling Strategy section of this website that you may find useful.

This article is divided into two sections:

Each of these sections is discussed in turn:

Key terms of sampling

Some of the key sampling terms you will come across include:

Each is discussed in turn:

  • Population

  • The word population is different when used in research compared with the way we think about a population under normal circumstances. Typically, we refer to the population of a country (or region), such as the United States or Great Britain. However, in research (and the theory of sampling), the word population has a different meaning. In sampling, a population signifies the units that we are interested in studying. These units could be:

    Some examples of each of these types of population are present below:

    • People

    • Students enrolled at a university (e.g. Harvard University) or studying a particular course (e.g. Statistics 101)
      United States Senators or Congressman who are Democrats
      Users of Facebook or Twitter
      Presidents and CEOs of Fortune 500 or FTSE 100 companies
      Nurses working at hospitals in the State of Texas

    • Cases

    • Recruitment agencies in Greater London, England
      Law firms in Manhattan, New York, United States
      The World Trade Organisation (WTO)
      The European Parliament
      Countries that are members of NATO
      Signatories of the Helsinki Accord

    • Pieces of data

    • Customer transactions at Wal-Mart or Tesco between two time points (e.g. 1st April 2009 and 31st March 2010)
      The breaking distances (in kpm/m) of a particular model of car
      University applications in the United States in 2011
      Households with broadband subscriptions in the town of Carmarthen, Wales

    When thinking about the population you are interested in studying, it is important to be precise. For example, if we say that our population is users of Facebook, this would imply that we were interested in all 500 million (or more) Facebook users, irrespective of what country they were in, whether they were male or female, what age they were, how often they used Facebook, and so forth. However, if the population you were interested in was more specific, you should make this clear. Perhaps our population is not Facebook users, but frequent, male Facebook users in the United States. When we come to describe our population further, we would also need to define what we meant by frequent users (e.g. people that log in to Facebook at least once a day).

  • Units

  • As discussed above, the population that you are interested in can consist of units, cases or pieces of data. These terms can sometimes be used interchangeably. In this website, we use the word units whenever we are referring to those things that make up a population. However, since you may find other textbooks referring to these units as cases or pieces of data, we have provided some further clarification below:

    • People
    • Cases
    • Pieces of data
    • People
      The population you are interested in consists of one or more units. For example, if the population we were interested in was all 500 million (or more) Facebook users, each of these Facebook users would be a unit. So we would have 500 million (or more) units in our population. If we were interested in CEOs (or Presidents) of Fortune 500 companies, the CEOs (or Presidents) would be our units.

    • Cases
      Sometimes the word units is replaced with the word cases. As highlighted in the population examples above, sometimes the populations we are interested in are organisations, institutions and countries. In such cases, it is often more appropriate to refer to each of these (e.g. recruitment agencies, law firms) as cases. You may be interested in a population that consists of only one case (e.g. the World Trade Organisation or European Parliament) or maybe you are interested in a population that has many cases (e.g. recruitment agencies in London, of which there must be hundreds).

    • Pieces of data
      Finally, researchers sometimes refer to populations consisting of data (or pieces of data) instead of units or cases. For example, researchers may be interested in customer transactions at a particular supermarket (e.g. Wal-Mart or Tesco) between two time points (e.g. 1st April 2009 and 31st March 2010); perhaps because they want to examine the effect of certain promotions on sales figures.

  • Sample

  • When we are interested in a population, it is often impractical and sometimes undesirable to try and study the entire population. For example, if the population we were interested in was frequent, male Facebook users in the United States, this could be millions of users (i.e. millions of units). If we chose to study these Facebook users using interviews (i.e. our chosen research method), it could take a lifetime. Therefore, we choose to study just a sample of these Facebook users.

    Whilst we discuss more about sampling and why we sample later in this article, the important point to remember here is that a sample consists of only those units (in this case, Facebook users) from our population of interest (i.e. X million frequent, male, Facebook users in the United States) that we actually study (e.g. 500 or 1000 of these Facebook users).

  • Sample size

  • The sample size is simply the number of units in your sample. In the example above, the sample size selected may be just 500 or 1000 of the Facebook users that are part of our population of frequent, male, Facebook users in the United States.

    In practice, the sample size that is selected for a study can have a significant impact on the quality of your findings, with sample sizes that are either too small or excessively large both potentially leading to incorrect findings. As a result, sample size calculations are sometimes performed to determine how large your sample size needs to be to avoid such problems. However, these calculations can be complex, and are typically not performed at the undergraduate and master’s level when completing a dissertation.

  • Sampling frame

  • The sampling frame is very similar to the population you are studying, and may be exactly the same. When selecting units from the population to be included in your sample, it is sometimes desirable to get hold of a list of the population from which you select units. This is the case when using certain types of sampling technique (i.e. probability sampling techniques), which we discuss later in the article. This list can be referred to as the sampling frame. We explain more about sampling frames in the article: Probability sampling explained.

  • Sampling techniques

  • As we have mentioned above, when we are interested in a population, we typically study a sample of that population rather than attempt to study the entire population (e.g. just 500 of the X million frequent, male Facebook users in the United States). If we imagine that our desired sample size was just 500 of these Facebook users, the question arises: How do we know what Facebook users to invite to take part in our sample? In other words, what Facebook users will becomes part of our sample?

    The purpose of sampling techniques is to help you select units (e.g. Facebook users) to be included in your sample (e.g. of 500 Facebook users). Broadly speaking, there are two groups of sampling technique:

    Each of these sampling techniques is discussed below:

    • Probability sampling techniques

    • Probability sampling techniques use random selection (i.e. probabilistic methods) to help you select units from your sampling frame (i.e. similar or exactly that same as your population) to be included in your sample. These procedures (i.e. probabilistic methods) are very clearly defined, making it easy to follow them. Since the characteristics of the sample researchers are interested in vary, different types of probability sampling technique exist to help you select the appropriate units to be included in your sample. These types of probability sampling technique include simple random sampling, systematic random sampling, stratified random sampling and cluster sampling.

      We discuss probability sampling in more detail the article, Probability sampling explained. We also discuss each of these different types of probability sampling technique, how to carry them out, and their advantages and disadvantages [see the articles: Simple random sampling: An overview, Systematic random sampling: An overview, Stratified random sampling: An overview].

    • Non-probability sampling techniques

    • Non-probability sampling techniques rely on the subjective judgement of the researcher when selecting units from the population to be included in the sample. For some of the different types of non-probability sampling technique, the procedures for selecting units to be included in the sample are very clearly defined, just like probability sampling techniques. However, in others (e.g. purposive sampling), the subjective judgement required to select units from the population, which involves a combination of theory, experience and insight from the research process, makes selecting units more complicated. Overall, the types of non-probability sampling technique you are likely to come across include quota sampling, purposive sampling, convenience sampling, snowball sampling, and self-section sampling.

      We discuss non-probability sampling in more detail in the article, Non-probability sampling explained. We also discuss each of these different types of non-probability sampling technique, how to carry them out, and their advantages and disadvantages [see the articles: Quota sampling: An overview, Purposive sampling: An overview, Convenience sampling: An overview, Snowball sampling: An overview, Self-selection sampling: An overview].

    The choice of sampling technique that you use to select units for your sample will vary depending on the research strategy that is guiding your dissertation. We discuss this, together with the basic principles of sampling in the section: Basic principles of sampling.

  • Sampling bias

  • Sampling bias occurs when the units that are selected from the population for inclusion in your sample are not characteristic of (i.e. do not reflect) the population. This can lead to your sample being unrepresentative of the population you are interested in.

    For example, you want to measure how often residents in New York go to a Broadway show in a given year. Clearly, standing along Broadway and asking people as they pass by how often they went to Broadway shows in a given year would not make sense because a higher proportion of those passing by are likely to have just come out of a show. The sample would therefore be biased.

    For this reason, we have to think carefully about the types of sampling techniques we use when selecting units to be included in our sample. Some sampling techniques, such as convenience sampling, a type of non-probability sampling (which reflected the Broadway example above), are prone to greater bias than probability sampling techniques.

In the section that follows, we discuss some of these key terms in the context of the basic principles of sampling.

Basic principles of sampling

NOTE:
Before reading the basic principles of sampling below, we suggest that you first familiarise yourself with the key terms above: population, unit, sample, sample size, sampling frame, sampling techniques.

As student researchers, many of us are interested in studying something about a particular population:

  • Biologists and sports scientists may try and understand the functions of human beings.

  • Social scientists and psychologists may be interested in the behaviour of human beings.

  • Economist may be interested in building model that explain the results of such behaviour.

  • Business students may be interested in finding ways to influence such behaviour.

Such populations of interest may include people (e.g. students enrolled at a university, CEOs of Fortune 500 companies), organisations and countries (e.g. law firms in New York, members of NATO), or pieces of data (e.g. customer transactions at a single supermarket, householders with broadband access in a small town).

Irrespective of what the population of interest is, it is often not feasible to study the entire population (e.g. for reasons of cost and time) and it is typically unnecessary to do so. Instead, we study a sample of the population (i.e. the sample should ideally be a representation of the population with similar characteristics). For example, if the population we were interested in was frequent, male Facebook users in the United States, which total in the millions, we may choose to only study a sample of say 500 or 1000 of these Facebook users. This reflects our desired sample size (i.e. the 500 or 1000 Facebook users).

However, the dilemma is how we should select units from the population to be included in our sample. In other words, how do we determine which of the millions of frequent, male Facebook users in the United States should be included in our sample of just 500 or 1000 Facebook users?

To answer this question, we need to look to the research strategy guiding your research. This is because all dissertations are guided by an overarching research strategy, which determines everything from the research paradigm, research design, and research methodology influencing your dissertation, through to choices of research methods, sampling strategies, data analysis techniques, and even research ethics that you make [see the article, Dissertation Research Strategy: Getting started, if you are unfamiliar with any of these terms]. Whilst you can learn more about the impact of your research strategy on your sampling strategy in the article, Research strategy and sampling strategy, the following two choices of research design provide some insight:

Each of these is discussed in turn:

  • Quantitative research designs and probability sampling

  • If your dissertation is being guided by a quantitative research design, it is likely that you want to be able to make generalisations (i.e. statistical inferences) from your sample to the broader population you are studying. For example, imagine we were interested in how often these frequent, male Facebook users in the United States paid to use a Facebook app. Let’s say that we collected this data using structured interviews (i.e. our research method), and having analysed the data, we arrived at our findings (i.e. our results). Whilst we only examined the responses of, let’s say, 500 of these Facebook users, we would want to be able to make generalisations about the millions of frequent, male Facebook users in the United States, from which the sample was drawn.

    To achieve this, we need to make sure that the sample we study has very similar characteristics to the population we are interested in. As a result, we choose to select units (i.e. Facebook users) from the population to include in our sample using random selection (i.e. probabilistic methods). This use of random selection is the cornerstone of probability sampling techniques (e.g. simple random sampling, systematic random sampling, stratified random sampling). Using such probability sampling techniques helps to reduce the potential sampling bias that would otherwise be present if we were to select units without using random selection. It also provides us with tools to assess the quality of our findings. You can find out more about probability sampling in the article: Probability sampling explained.

  • Qualitative research designs and non-probability sampling

  • If your dissertation is being guided by a qualitative research design, it is likely that your focus will be primarily on understanding the intricacies of your sample, with any desire to make generalisations from your sample to the population a secondary consideration (or perhaps not even a consideration at all). If we take the example above, in a dissertation guided by a qualitative research design, we are more likely to be interested in why these frequent, male Facebook users in the United States paid to use a Facebook app; not how often they did so. Rather than collecting data through structured interviews, we may select an alternative research method, such as focus groups or unstructured interviews, which are more appropriate when using a qualitative research design.

    Having collected the data, also from a sample of 500 (or perhaps much fewer) of these Facebook users, we focus on what we have learnt from just this sample. We do not make the assumption, like we do when using a quantitative research design and probability sampling techniques, that we can make generalisations from this sample. We have to be much more careful about doing this [see the article: Research strategies and research quality coming soon, to understand some of the reasons why]. However, since our focus is on the sample of Facebook users, other types of sampling techniques can be more appropriate, especially non-probability sampling techniques (e.g. purposive sampling; other non-probability sampling techniques include snowball sampling, self-selection sampling, convenience sampling, and quota sampling). You can find out more about non-probability sampling in the article: Non-probability sampling explained.

Whilst the two examples above are crude generalisations of the use of qualitative and quantitative research designs, as well as probability and non-probability sampling, the important point is that the purpose of sampling varies depending on (a) the goals of your dissertation and (b) the research strategy that you use to guide the research process. Again, you can learn more about this from the article: Research strategy and sampling strategy.