There are a number of different levels of replication discussed in the literature. As discussed, it is not just about duplication. In this case, we are interested in generalisation, which whilst involving duplication, goes further. In the sections that follow, we explain (a) what generalisation is and (b) how to decide what type of generalisation to test.
Generalisation aims to examine whether the findings from the original study can be generalised; that is, do the findings hold across populations, settings/contexts, treatments and time. To illustrate what we mean by these types of generalisation, we use Study #1 below to provide some background. Just imagine that (a) you are doing a replication-based dissertation, (b) this is the study you want to replicate, and (c) you are taking on Route B: Generalisation.
Study #1:
  The impact of teaching method on exam performance
We want to examine how two different teaching methods (i.e., the independent variable) affect the exam performance (i.e., the dependent variable) of university students.  More specifically, we want to know if the addition of a seminar class to traditional lecturing improves exam performance, and if so, by how much.  This is important because the university only has a limited budget, so it would not want to add seminar classes to lectures if students? exam performance was not significantly improved as a result.  The course in question is Research Methods 101.
Students took an exam at the beginning of the course (i.e., the pre-test) to determine their general aptitude for the subject matter (i.e., their natural ability in Research Methods).  This was done to ensure that the two groups being investigated (i.e., the control group and treatment group) were more or less equal in terms of natural ability.  Each group consisted of 50 students who were randomly assigned to their respective groups.  For the next 12 weeks (i.e., the duration of the course), the control group were given the ?normal? teaching method, which consisted of two 1-hour long lectures each week.  During this same period, the treatment group were given the same two 1-hour lectures each week, but also attended one 1-hour seminar.  At the end of the 12 weeks, the students from the control group and the treatment group would be given the same Research Methods 101 exam (i.e., the post-test).  The goal of the experiment was to compare the differences in the scores on the dependent variable (i.e., exam performance) between the two groups (i.e., the control and treatment groups).
When the pre- and post-test scores of the control group (i.e., lectures only) and treatment group (i.e., lectures and seminars) are compared, the results suggest that students who received lectures and seminars (i.e., the treatment group) outperformed the students who only received lectures (i.e., the control group) by an average (i.e., mean) of 6.3% (out of 100%).  As such, we may argue that seminar attendance (i.e., one level of the independent variable) increases (i.e., causes an increase in) exam performance (i.e., the dependent variable).
Drawing on Study #1 above, let's imagine how we may want to make generalisations from these results if we were doing a replication-based dissertation following Route B: Generalisation. There four types of generalisation are: generalisations across populations; generalisations across treatments; generalisations across settings/contexts; and generalisations across time.
Sometimes we want to know if the results from the original study can be generalised across populations. For example, in Study #1 we were interested in undergraduate students at a single university in the United Kingdom. If we wanted to test whether generalisations could be made across populations, we might ask ourselves: Would the new teaching methods be as effective amongst postgraduate students as undergraduate students at the university?
Your replication-based dissertation could test whether the findings from the original study hold if the populations being studied were different.
Experimental and quasi-experimental research designs involve specific treatments. By treatments, we are referring to the interventions that are made in experiments. For example, in Study #1, we gave the 50 students in the control group two 1-hour long lectures each week for 12 weeks, whilst we gave the 50 students in the treatment group the same two 1-hour lectures each week, but also one 1-hour seminar each week for the 12 week period. Therefore, the characteristics of the treatment in this experiment include the number of lectures (i.e., 24 lectures), the number of seminar classes (i.e., 12 classes), the interval between each of these lectures/seminars (i.e., 1 week), the length of the lectures (i.e., 1 hour each) and seminars (i.e., 1 hour each), and the time period of the experiment (i.e., 12 weeks).
The question arises: Do the treatment characteristics have to be the same when applied to different populations or settings/contexts to arrive at the same conclusions? In other words, would the results from Study #1 be significantly different if the characteristics of the treatment were altered? By significantly, we mean that the results are sufficiently different such that we cannot make the same conclusions about studies where the characteristics of the treatment are different.
Going back to Study #1, what if the control group and treatment group were given the same amount of learning time? After all, both groups receive two 1-hour long lectures each week, but the treatment group receives an additional 1-hour long seminar class. Therefore, what if the control group received three 1-hour long lectures each week whilst the treatment group only had two 1-hour lectures? Would Study #1 still show that seminar attendance increased exam performance? Similarly, what if we simply decided to cut the number of seminars in half? Or what if we extended the learning period from 12 weeks to 16 weeks?
The question arises: Why do we care about such differences in the characteristics of treatments? Let's go back to Study #1 again.
In Study #1, we only looked at a single university in the United Kingdom, but imagine that we wanted to make generalisations to a wider population such as all universities in the United Kingdom. Clearly, not all universities in the United Kingdom only provide 1-hour long lectures and 1-hour long seminars. Some use 2-hour lectures and 45 minute seminars (amongst other combinations). We need to ask ourselves: Would the new teaching method be as effective if the lectures and/or seminars were longer or shorter? We can only make generalisations across treatments if the answer to this question is YES. After all, if the answer is NO, the conclusions from our study cannot be generalised across treatments; that is, our conclusions are not externally valid across treatments.
Therefore, a good replication-based dissertation could simply test whether the findings from the original study hold if the treatments were different.
Quantitative research typically focuses on a single or small number of settings/contexts. This is often done to control for potential extraneous/confounding variables, but also to reduce research time and costs. The question arises: Would the same result have been found in a different setting/context?
If we wanted to make generalisations from the results in Study #1 to another setting/context, we might ask ourselves the following questions: Would the new teaching method be as effective in Australia or the United States as it was in the United Kingdom? Would the new teaching method be as effective if taught online (e.g., through live streaming of lectures and group videoconferencing for seminars) rather than in a traditional, physical setting?
Your replication-based dissertation could test whether the findings from the original study hold if the settings/contexts being studied were different. This may mean carrying out the original study, but in different organisational types, industries, countries, cultures, and so forth.
With the exception of longitudinal studies, which are rarely conducted at the undergraduate and master's dissertation level, the results from quantitative research tend to reflect a snapshot in time. By a snapshot in time, we mean that most experiments (a) are conducted within a specific time period (e.g., the 12 weeks in Study #1), and (b) take measurements that are time-dependent; that is, obtain data that could only be collected within that time period (e.g., the exam scores in Study #1 reflected the students' ability at that point in time). Therefore, when we conclude that seminar attendance (i.e., one level of the independent variable) increases (i.e., causes an increase in) exam performance (i.e., the dependent variable), this reflects the lectures and seminars that were given during a 12 week period, and the exam performance of students at the end of that period.
The question arises: Would the results hold over time? In other words, if we conducted this experiment at some point in the future (e.g., in 5 years from now), would we get the same results? If we feel that the answer is YES, perhaps because we imagine teaching methods and student ability to be fairly constant over the next 5 years, we could argue that our results are generalizable across time.
However, time affects experimental conditions in different ways, which determines whether generalisations can be made. For example, studies that focus on culture at the national level (e.g., the Chinese culture, German culture, etc.) are more likely to be generalizable over time than studies of culture in a single organisation. This is because national cultures often change very little, and when they do, such change tends to take place over decades. By comparison, even an organisation with a strong culture could witness a relatively rapid change (e.g., months or a few years) if it were acquired by another organisation with a vastly different culture (e.g., an organisation with a power culture taking over an organisation with a people culture).
When making generalisations across time, care must be taken to assess whether the population, treatments and/or settings/contexts are likely to be prone to change over the time period you want to make generalisations to. For example, you are unlikely to make generalisations over all time, but rather a particular time period (e.g., a number of months, a few years, or perhaps even decades).
Your replication-based dissertation could test whether the findings from the original study hold across time, assuming there is a good reason why time could have undermined the findings from the original study if that study were conducted today.
You will often recognise journal articles that involve replication involving generalisation because of the titles that are given (see examples below):
Example titles
Smoking initiation and schizophrenia: A replication study in a Spanish sample (Gurpegui et al., 2005)
Association of IL2RA and IL2RB with rheumatoid arthritis: A replication study in a Dutch population (Kurreeman et al., 2009)
Measuring customer orientation of salespeople: A replication with industrial buyers (Michaels & Day, 1985)
Comparing detection methods for software requirements inspections: A replication using professional subjects (Porter & Votta, 1998)
How to decide what type of generalisation to test
Broadly speaking, deciding what type of generalisation to test comes down to balancing your personal interests and academic reasons:
Personal interests
To say that personal interests can determine what type of generalisation you intend to test is not particularly academic, but it is inevitable that this will play a role. For example, you may be interested in the findings of the original study, but not the population or setting/context where the study was conducted. Your interests may lie in a different population or setting/context. Whilst it is useful from an academic perspective to have a justification for replicating a study in a different population or setting/context, perhaps because you feel that there is a good reason why this different population or setting/context would yield different results, it can also be useful to simply see how far an original study generalises (i.e., by testing a wider range of different populations or settings/context, even if you are not sure why these might yield different results).
Academic reasons
Whilst personal interests take you so far, it is often important to have a good academic reason for justifying the choice of generalisation that you want to make. For example, there may be particular characteristics of different populations that would suggest the results from the original study would not hold. In the case of making generalisations across settings/contexts, the academic literature may point to a range of reasons (e.g., cultural, functional, economic, political, and other reasons) why the results from the original study would not hold.
Whilst it's worth reading on to see if you would rather pursue a dissertation based on Route C: Extension, it's worth reiterating that if you want to pursue Route B: Generalisation, you need to decide what type of generalisation you are going to make (population, treatment, setting/context, or time-based), and the personal and academic reasons for this.