Transcript: Introduction to study design

Transcript of Introduction to study design

Adrian Bauman: Hello, my name's Adrian Bauman from the School of Public Health at Sydney University and this video is about study design. It has a particular focus on population health contexts and health services.

Evaluation is important to the NSW Ministry of Health, in fact it's whole of government policy to evaluate and be accountable for all programs in the NSW Government.

This monograph, 'Study Design for Evaluating Population Health and Health Service Interventions', is a guide produced by [the] NSW Ministry of Health because they're committed to evaluate programs to develop and implement evidence-based policies and programs. This is important because it's public expenditure that we're using so we need to be sure that the programs we are disseminating actually work. This introduction to study design is freely available online at any time.

This document is particularly focused on study design for assessing program effectiveness, in other words do interventions work, but there are sections on planning an evaluation and sections on the different study design types, some of which we'll discuss in this brief video.

When you're thinking about any interventional program, why would you evaluate it? What's the program trying to achieve? What do you need to understand about the program for it to be useful for population health or health services planning and practice? It's important to plan an evaluation in advance to decide what design you're going to need and why. Every evaluation needs a study design to be developed and worked through before you begin.

For each purpose of evaluation you may need a different kind of study design to evaluate it. Mostly in the NSW Ministry of Health you'll look at program impact: what effect is the program having on people who participate in it? How long is that effect maintained? Sometimes in practice you'll do evaluation for program improvement or program monitoring and accountability. Increasingly we are evaluating things for research translation which is, can programs be transferred to other settings and scaled up to reach more people, more hospitals, more health service units, more community health centres etc.

One important quick note is to distinguish between efficacy and effectiveness. Efficacy is where you design a study under optimal conditions, well-controlled conditions, really to maximise the experimental design and give yourself the most scientific design possible. Effectiveness is testing a program and evaluating it in real world conditions. These terms are obviously related, both are trying to tell you how well a program works, but efficacy is the more scientific but also requires more controlled settings and environments.

When you're planning an evaluation you need to define: the purpose of the evaluation, the program logic, whether the program's ready to be evaluated (evaluability assessment), have you got the resources and expertise to conduct the research, and making sure that the timing of your outcomes fits with the natural trajectories of when effects are expected to occur. These planning decisions will influence the kinds of study design that you might choose.

Thinking about the level of a program, is it a simple evaluation of a single component intervention to a single target group with a linear relationship between exposure to the intervention and the outcomes? That's very amenable to a randomised controlled trial. A more complicated intervention will have multiple components but will still be in one setting, usually in a setting such as a school which might have several different components. Most difficult is a public health complex or comprehensive program evaluation which has multiple interventions in diverse settings across the region. The study designs, there may be several, you may have several different components and there'll be a study design for each component that you choose to evaluate.

It's important to build a logic model showing the inputs, outputs and outcomes as part of your planning process. Identifying what resources you have, what the things are you're going to do, those are called activities, who you're trying to reach, that's participants, and then the short, medium and long-term impacts and outcomes of your proposed intervention. Building this early will help you define the study design needs overall or for each of these specific stages in your logic model.

Study design is a plan, it's a plan which says why you should do something, what's the problem you're trying to solve, it's a plan that describes what you're going to do in the conduct of the study, the scientific methods that you'll use, the hypotheses to be tested, the measures that you'll use, the data you need to collect and who's going to interpret and collect that information, and who's going to use the information, so what's the purpose of the study that will help refine your decisions about the study design that you choose.

Most of what you do in the Ministry of Health will be quantitative or mixed methods in nature. In mixed methods you'll use both quantitative and qualitative designs to support each other or to at least indicate whether the findings are consistent.

Quantitative designs have the strengths of established reliable and valid measures, the potential for doing statistical testing so you can assess the probability of the effects that you observe being due to chance or not, and give you a stronger attempt at a causal inference, the changes that you observed are caused by the intervention that you're testing. Some parts of your evaluation will be process evaluation, some parts will be impact or outcome evaluation. You need to distinguish those parts as well and other documents from the Ministry of Health describe process and impact evaluation in more detail.

You choose a study design by framing your research questions, examining your researcher skills, who's going to do the work, looking at your funds available to do the evaluation, to conduct the study, and what timeframe are you expected to produce results within, and what's reasonable and realistic.

This is the well-known hierarchy of evidence with systematic reviews and meta-analyses the best quality evidence. Underneath that is single randomised controlled trials, quasi-experimental designs, before-after designs and at the bottom end of the pyramid post-only and cross-sectional studies that only show correlation, don't show causation, and qualitative designs and explanatory studies. In addition, the NH&MRC grades the evidence from A to D also based on the number of good quality studies: is there are only one randomised control trial or are there several? Secondly, are the findings consistent to the randomised trials point in the same direction? Third, are the results generalisable, are they conducted in representative, more population-like samples so that they're not just confined to the volunteers that attend, and finally are they of clinical significance and health services feasibility? Those are also important in thinking about your study design choice and your study design evidence review.

When we think about research designs for measuring impact or outcomes we move from pre-experimental, the weakest evidence, to experimental designs, the strongest evidence. Randomised control trials provide the best evidence because individuals are randomly allocated to receive the intervention or to controls and the chance of someone being allocated to the intervention is at random. There is a baseline measurement, randomisation, and a follow-up measurement shown in the slide as 02 and then the change is compared between observation 101 and 02 in those that received the intervention, shown as X in the diagram, compared to those that didn't.

Sometimes you need to randomly allocate groups or organisational levels because people within the same school or the same workplace may be clustered, in other words their behaviors or outcomes may be correlated with each other so that you randomly allocate 20 schools, 10 to controls, 10 to intervention, it's a randomised controlled trial but randomisation at the level of a group and otherwise follows the same processes as a randomised control trial, you've just got to adjust for that clustering effect within group.

Non-randomised trials are quite common in population health study designs and it might be necessary when you can't randomise, for example if you're running a statewide initiative or something to which everyone is exposed, such as a policy. In evaluating those kinds of interventions it may be impossible to separate people at random into intervention and controls.

So, let's look at the non-randomised designs and their strengths and weaknesses.

Quasi-experimental designs come from the Latin quasi 'as if' experimental but not actually randomised. The first design is a before-after design with a comparison group, shown here as Design III, which is a good design for a large-scale program if you can get another region to act as a comparison group. A stronger design is a time series design because you've got multiple time points to assess trends before the intervention shown as X and then you've got multiple time points following the intervention and can look at whether the intervention influenced those parameters.

A very strong quasi-experimental design is shown as Design V at the lower part of the slide which is a time series design with a comparison group or comparison region. Pre-experimental designs are weaker. Design II, the before-after design without a control group, is sometimes used, often used in pilot studies, sometimes it's used in statewide studies because there can be no comparison group, but Design I, the after-only study, should never ever be used for assessing program effects. That's because you can't tell where people were before they received the intervention. Design I is used for measuring what people thought of the intervention, program satisfaction, what they did with the intervention, but you cannot use it for assessing change.

There are several advantages of quasi-experimental designs: they're easier and less expensive than randomised controlled trials, they may be closer to real life, and time series data may be routinely collected health service data, emergency department data, screening data for a particular kind of cancer, or annual surveys or routine data collections. The disadvantages are they're less strong in assessing did the intervention cause the changes you observed, they can be biased through self-selection if comparison regions are different to intervention regions, sometimes the comparison region may not be possible. If you can't have controls then a before-after design or, even better, a time series design may still be a better choice but the observed results may be due to confounding factors unrelated to the intervention.

A more complicated approach is a stepped-wedge design and here you can see at Time 1 for example everybody is a control. At Time 2 only one of the units has been allocated to be an intervention and four are controls. At Time 3 two are shaded purple as interventions and three act as controls so you get a lot of control time compared to a lot of intervention time so it's an efficient way of conducting a randomised controlled trial. It may be pragmatic if you want to roll out an intervention sequentially across local health districts or across many workplaces and you haven't got the resources to do it all at once, so it's ideal for this gradual rollout and sometimes it can be not randomised, you can just roll them out sequentially and you can iteratively improve the intervention as you learn from your first few time periods or waves of data collection. However, it's not a design that's very often used because it's organisationally and analytically a little more complicated.

The purpose of good study design is to understand whether the program caused the observed effects. Study design is one component of causal thinking, experimental or good quasi-experimental design is better.

The other things that you need are a strong association, a very big odds ratio or relative risk or a highly significant p-value on your effect size. You need the time sequence for the intervention to precede the outcome and ideally you need to have some theoretical mechanisms, whether they're behavioral or physiological that can help you explain why that intervention actually achieved the objectives that it did.

I'm going to conclude with a couple of illustrative examples. Let's assume you want to run a cognitive intervention in aged care facility residents. In this example you might have three aged care facilities and you choose all residents over 80 years of age. You invite subjects to participate but only 30 of 100 eligible residents agree to participate and you then randomise them to intervention and controls, and you measure the outcome with a cognitive function self-complete measure to see whether the intervention improved their cognitive function.

Think about this: what are the problems in this study design? There are many problems. Firstly, you've only got three aged care facilities and there's clustering within those facilities of people being more similar to each other, they may not be representative, they may talk to each other, so there's clustering and contamination. Secondly, you've chosen residents over 80: are they representative of all residents? What about residents in their 70s? The issue of generalisability. Next you've got selection biases because those 30 that participate might be quite different to the 70 that didn't participate in cognitive function, the outcome of interest. And finally, cognitive function has a self-complete measure, maybe a terribly biased measure of cognitive processes. In other words, "How well are you thinking today?" is not a good question to ask of people with cognitive impairment so your measurement might fundamentally bias your results.

Here are some more examples. How would you evaluate a statewide mass media campaign? How would you evaluate a new screening test or assess whether a new procedure is better than established practice? Finally, how would you best implement a new clinical guideline? This is an implementation science question. You might want to stop the video and think about what study designs you'd use for each of these four examples.

For a statewide mass media campaign you might have to use before and after representative population surveys. It's difficult to get a comparison group when it's something that everyone will be exposed to.

For evaluating the new screening test you might use the time series design and measure disease incidence changes before and after the introduction of the screening test. It would be possible here to have a comparison region as well if you introduce the screening test in only one region.

The third example, introduction of a new procedure, you might use a cluster randomised control trial, randomising groups or hospitals to receive the new procedure compared to those that don't receive the procedure, or you might roll that out over time using a stepped-wedge design.

And finally, understanding implementation of a new clinical guideline is a study where you're testing implementation strategies so you might randomise groups to different implementation approaches with the level of implementation as your outcome, although the design is a randomised controlled trial.

Think about those examples and their relevance in your work.

So, to conclude, study design depends on why you're doing the evaluation. You need to build a logic model: it guides the evaluation and study designs that you will choose. You need to be sure of having the skills and expertise and sufficient time and those things will influence what study design you choose. But, fundamentally, you choose the best study design that's feasible and affordable that you can, to generate the best possible scientific quality evidence that the intervention caused any changes that you see.

Sometimes you'll use mixed methods with some qualitative focus groups or structured discussions with participants, corroborating or not the findings that you observed in your quantitative evaluation and this process is called triangulation of whether the quantitative and qualitative methods are pointing in the same direction.

And your design, finally, is also based on the purpose or use of your findings and you need to have every study design fit for purpose.

Return to Online presentations

Current as at: Tuesday 20 September 2022

Contact page owner: Centre for Epidemiology and Evidence