September 2022
Welcome to the
RAPPORT project blog!
RAPPORT stands
for Developing a Rapid AI-based Policy Probing
and Observational Research Tool
The project is
funded by Wellcome via their Mental Health Data Prize
I am really excited about this project. It is an amazing opportunity to apply and publicise new, powerful data science methods for mental health research. Thank you Wellcome! 😀
I am Paul Tiffin, the principal investigator (PI) for the project. Before I let the other team members introduce themselves let me give you an overview of the project:
It’s important that treatments and health policies actually improve well-being and health. Sometimes well-intentioned healthcare interventions either don’t work for some groups of individuals, or can even cause unintended harms.
Scientific research is vital if we are to have evidence over whether
such interventions are likely to work or not, and for which groups of people.
Traditionally such evidence has been often provided by ‘randomised controlled
trials’ (RCTs). RCTs are experiments where usually one group of people have a
particular treatment, and another group are offered an alternative treatment,
or no treatment as such (e.g. a ‘placebo’, or sugar pill). Who gets which
treatment is decided by chance, hence the term ‘Randomised’ controlled trial.
In this way the actual causal effect of the experimental treatment,
compared to any alternative or placebo, can be worked out. This is because any
other characteristics of the people involved in the trial, that may be
associated with the health outcome of interest, should be similar between the
two groups, due to the randomisation.
However, RCTs can take a long time to set up and report their findings.
They are also very expensive to run, often costing several millions of
We now have lots of information on patients, in the form of data from
everyday practice and from research studies. However, unlike in RCTs, in the
real world patients aren’t given treatments according to chance. This can make
working out whether a particular health policy or treatment has actually caused
an improvement (or even worsening) of health very difficult.
However, using new statistical methods we can use routine health data,
and those from scientific studies (even those that don’t involve randomised
trials) to understand if new policies, practices or treatments cause
improvements in health. Moreover, we can now use mathematical approaches to
work out which interventions are likely to work for whom?
Machine learning is part of Artificial Intelligence (AI)- where
decisions can be made automatically without a human being involved. In machine
learning computers can learn to recognise patterns in data in order to make
In the RAPPORT project we will use machine learning in two main ways.
Firstly, we will use traditional statistical methods used to understand
population health (‘epidemiology’) and combine these with machine learning.
This approach will enable us to better understand the causal effects of a
mental health intervention, not just whether it is associated with better
outcomes. This approach is known as ‘targeted learning’ and seems to be more
effective than existing statistical methods for this purpose. Targeted learning
has started to be applied to the understanding of physical health issues but
there are very few examples of it being used in mental health research.
The second way we will be using machine learning is to understand how
different groups of people respond differently to an intervention or treatment.
These methods are known as ‘causal forests’. They work by learning the rules
that can predict who is most likely to respond most positively to an
intervention. This means we can identify the groups of people most likely to
benefit from a certain policy or treatment.
The initial Discovery phase of the project will apply these new approaches to the Millennium Cohort Study data to assess the impact of childhood physical activity (an ‘active ingredient’) on depression. The Millennium Cohort (also known as ‘Child of the New Century’) is following the lives of around 19,000 young people born in the UK in 2000-02 (Millenium Cohort Study). It contains a lot of information on health and lifestyle.
For the ‘Prototyping’ phase, if funded, we plan to develop some proof of
concept tools, that will show how we can help other mental health researchers
access these new, powerful techniques in easily available user-friendly
Our work will involve and engage experts with relevant lived experience
from the start. Their knowledge and insights will help inform our approach to
data analysis, as well as how we make sense of, report and communicate our
Science in general, and machine learning studies in particular, has been
criticised recently for the lack of transparency and reproducibility. By this,
people mean that researchers do not always give all the details required for
other scientists to be able to replicate their results, thus showing they are
likely to be true and accurate. Working with experts by experience and
other stakeholders we will establish a framework for the transparent and
replicable implementation of this approach. This will involve ensuring we
report all the relevant details of our methods when we give our findings. Also,
it will involve clearly labelling all the computer code we have used and making
that publicly available also. This will help other scientists in the field
understand what we did and how we got the results we report. More generally it
will set high standards for reporting these kinds of studies, encouraging and supporting
better practice in machine learning-based research.
Overall our project is ambitious, but feasible. We aim to both
specifically understand the causal impact of ‘key ingredients ‘ related to
young people’s mental health. More generally, we intend to make new,
accessible, digital tools available to the research community and set a new
standard for transparency and reproducibility in reporting machine learning
based studies.
Post a Comment