Consideration 2: Broader considerations in the data analysis process for a dissertation

CONSIDERATION TWO

Broader considerations

There are a number of broader considerations that you need to take into account during the data analysis process: (a) the need to focus on analysis that answers your immediate research hypotheses; (b) the opportunity to dig deeper into your data; (c) how the main journal article can help you make decisions about your statistical analysis; (d) the need to defend your statistical choices; (e) how far you should take your analysis; and (f) potential problems you may face, and ways to deal with these problems. Each of these considerations is discussed in turn:

Focus on analysis that answers your immediate research hypotheses

The primary purpose of data analysis in your dissertation is to answer your research hypotheses. Whilst this may sound obvious, a common mistake in dissertations is to forget that this is the purpose of data analysis, and simply view the data you collect as something that you can mine for answers. Whilst it can be worth digging deeper into your data, something we discuss in the next bullet, the primary purpose of your data analysis is to answer the research hypotheses you set. There are a number of reasons for this: (a) the research strategy you set (i.e., your choice of your research design, research methods and sampling strategy) was designed to answer these hypotheses; (b) the theoretical case you built reflected the constructs and differences/relationships that were presented through your hypotheses; (c) the ethical approval you sought for your dissertation, whether formal or informal, reflected the desire to answer these hypotheses; and (d) adding new hypotheses could have ethical implications (something we discuss in the next bullet).
It can also be worth digging deeper into your data
In quantitative research, it is important to first set hypotheses, and then test them. However, students will often first set hypotheses, test them, but then go back through the data they have collected, analysing it for additional differences or relationships, and then create new hypotheses (i.e., these usually involve adding new hypotheses rather than revising existing ones). A classic example of this involves data collected through surveys where demographic data has been recorded (e.g., gender, educational background, etc.), but was not included in the hypotheses that were set.

To illustrate this, imagine that the original research hypothesis was: Drivers are more likely to break the speed limit at night than during the day. In the survey of drivers, demographics such as the gender of drivers were recorded. Now, when it comes to the data analysis phase, rather than simply summarizing the demographics to illustrate the make-up of the sample that you used (i.e., to show the proportion of male to female drivers in the sample), gender gets introduced as a construct in a new hypothesis. For example, you have created a new hypothesis such as: Male drivers are more likely to break the speed limit at night than during the day compared to female drivers (or some other variation of this hypothesis, depending on whether you are making a prediction or not). When you re-analyse your data to test for the role of gender, you find that there is a statistically significant difference between males and females; males are more likely to break the speed limit at night than during the day compared to females.

The example above illustrates how you can gain more insight into your data by viewing it as something that you can mine for answers. However, if you do this, you should recognise that there are a number of criticisms of this approach, as well as potential problems that you need to take into account: (a) in statistics, if you start to test one thing after another from a single dataset (i.e., you try to test lots of potential relationships or differences between variables), you are bound, eventually, to find something (i.e., a statistically significant relationship or difference) simply by chance (i.e., the result is not accurate, but actually a Type I or Type II error); (b) if you choose to mine the data for answers, you need to correct for multiple comparisons, which tries to reduce the potential for relationships or differences to be found simply by chance (i.e., this is something that we discuss in the Data Analysis part of Lærd Dissertation, within the relevant statistical tests); (c) you need to consider the reason (i.e., theoretical justification) to test for additional relationships or differences (e.g., why add gender to the above hypothesis? Why do you think that gender would have a difference from a theoretical perspective?).

Whether you are able to (a) find justifications for further analysis and (b) correct for multiple comparisons, the important point is that you acknowledge the choices you made so that the reader knows what hypotheses are part of your initial study, supported by the theoretical case and research strategy you set, and which are the product of data mining after the fact.
Where possible, use the data analysis in the main journal article to help you
Since you are taking on a Route #1: Replication-based dissertation, you can learn a lot about the appropriate statistical analysis to run from the main journal article, especially in dissertations based on Route A: Duplication or Route B: Generalisation because the measurement procedure is the same (or extremely similar).

If you haven't already during STAGE FIVE: Building the theoretical case, learn about the statistical analysis used in the main journal article. Such data analysis techniques will often be detailed in the abstract, but if not, certainly in the research strategy section of the journal article (N.B., this section will more often be called the methods or methodology section). In some cases, authors include a data analysis section where they set out the data analysis techniques they used, and the reasons they used them. In particular, look for: (a) the way that variables are treated (i.e., are they grouped, weighted, etc.); (b) how the authors treated outliers and missing data (if they mention this); and (c) the different statistical tests that were run on the data.

Whilst the nature of your data may lead you to run different statistical tests from those in the main journal article (i.e., as mentioned in STEP ONE: Select the correct statistical tests to run on your data), the statistical analysis used can help point you in the right direction, as well as providing useful explanations of the statistical decisions that the authors made.
Be prepared to defend your statistical choices because it's not uncommon for supervisors to get the statistics wrong
Your supervisor should be an academic, but not all academics have a good knowledge of statistics. A high proportion of supervisors will not have completed a doctoral degree, such that the extent of their statistics knowledge will come from having completed an undergraduate and master's dissertation, together with any subject-specific statistics courses (e.g., An Introduction to Statistics in Biological Sciences, Psychology, Education, etc.). This is one reason why some journal articles with multiple authors include one academic that has knowledge of statistics to analyse the data that has been collected (although they are also not often a statistician, but an academic that has knowledge of quantitative data analysis).

It is important to know this because during the data analysis phase, it is not uncommon for students to either ask their supervisors for help, or for supervisors to suggest which statistical tests should be run. Under normal circumstances, this would be a good thing, but there are many occasions where supervisors recommend the wrong statistical tests, and in some instances, tell students to run a test that is wrong (NOTE: We are not suggesting that they do this on purpose, but it often happens because they lack the relevant statistics knowledge or they have not properly looked at your data). The problem is that when a supervisor tells a student to run a particular statistical test, it can be difficult to run a different test, even when the student feels (or knows) that the supervisor is wrong (i.e., this is understandably difficult where the supervisor is also one of your markers, not to mention the fact that students generally don't like to call out a supervisor for making a mistake; there is, after all, an assumption that the supervisor knows best). However, you need to be prepared to defend your statistical choices when you feel that they are the correct ones. To help you make the right choices, especially when it comes choosing the correct statistical tests to run in the first instance, look to our Statistical Test Selector.

As a final caveat, this is not to say that all supervisors do not have the statistics knowledge needed to help you, but at the very least, look to their publication records (if they have published), which are often displayed on the Departmental/School's website, and check whether they have published any quantitative research (i.e., you should be able to detect whether any of their publications involves quantitative data analysis in the abstract of the journal article). At the same time, remember that many supervisors perform qualitative research rather than quantitative research, and you will not necessarily have been allocated a supervisor that has experience with quantitative research (i.e., unless you are able to choose your supervisor).
How far to take your analysis
Even though you are taking on a quantitative dissertation, this does not mean that you are a statistics expert. In fact, most undergraduate and master's level students that take on quantitative dissertations will have very little statistics knowledge (i.e., possibly just an introductory class to statistics, which may have included the use of a statistics package such as SPSS, but this knowledge is often forgotten by the time you come to do your dissertation). Now as you will have read above, we do have articles that help you to select the appropriate statistical tests to use in your dissertation, statistically analyse your data using SPSS, and interpret the output from such analysis in the Data Analysis section of Lærd Dissertation. We even do this assuming little or no knowledge of statistics and SPSS, using straightforward, non-technical language. However, there is no question that some students find statistics harder than others.

As such, it is important at this stage that you bear this in mind when choosing what analysis to perform. As a general rule of thumb, if you are someone that finds statistics more challenging than most, avoid journal articles that mention statistical tests such as structural equation modelling (SEM) or partial least squares (PLS). You may even want to avoid statistical tests such as principal components analysis (PCA), factor analysis, logistic regression, and loglinear analysis, although we do have articles that will walk you through these statistical tests, step-by-step, in the Data Analysis section of Lærd Dissertation. On the other hand, if you did well in your introductory statistics class, took further statistics modules during your degree, and/or have a supervisor with a quantitative research background, you can always consider taking on more challenging statistical analysis.

We talk about choosing how far to take your analysis because sometimes there is more than one way to analyse your data, and you can moderate your goals in order to be able to perform more simple statistical tests that are easier to understand. Where there are such choices, we make them clear within the Data Analysis section of Lærd Dissertation.
Potential problems you can face and ways to deal with them
In most cases at the undergraduate and master's level, you shouldn't face major problems analysing your data. The use of our Statistical Test Selector, coupled with the comprehensive, step-by-step statistical guides that that we provide in the Data Analysis section of Lærd Dissertation should get you through most scenarios. However, there are a number of problems that you can face when analysing your data, some of which can be solved, but others providing to be more tricky. These problems arising from things such as: (a) an insufficient sample size; (b) unequal data being collected for groups when differences are being compared; (c) an inability to find an easy solutions when the assumptions for a given statistical test have been violated; (d) uncertainty over how to treat outliers and missing data; (e) amongst other factors we discuss in the Data Analysis section of Lærd Dissertation.

CONSIDERATION TWO

Broader considerations

CONSIDERATION THREE

Start your data analysis