The PACE Trial ‘Normal Range’ – an Untenable Construct
Analysis and opinion by Peter Kemp MA
[Emphases in quotes are added]
The PACE Trial was research into treatments for ‘ME/CFS’ which included Graded Exercise Therapy (GET), Cognitive Behavioural Therapy (CBT), Adaptive Pacing Therapy (APT) and Standardised Specialist Medical Care (SMC or SSMC i.e. Generic advice and medical care for CFS which all participants received – this was the Control/Comparison Group). The research was published in The Lancet in February 2011. The results were represented in the media as though GET and CBT were a great success, offered ‘hope’ to patients and could cure their illness. However, the published results were based on Outcome Measures which were completely different to those published in the Research Protocol. The PACE Trial investigators claimed that they changed the Protocol because the data ‘would be hard to interpret’ and in order to inflate the small differences between the treatments.
The PACE Trial Protocol Primary Outcome Measure criteria for ‘Positive Outcome’ required a score of 75 or more on the SF36PF (Short Form 36 Physical Function subscale), ‘Recovery’ required a score of 85. The Lancet report published in February 2011 did not include this data. Instead, the PACE investigators devised a construct which they called ‘Normal Range’ which required an SF36PF score of 60 or more. ‘Normal range’ was then represented as a significant treatment effect in the Lancet and worldwide media, even though it was 15 points below the threshold set for Positive Outcome in the Research Protocol. Illustration: Comparing ‘Normal Range’ as used in the Lancet and Primary Outcome Measures ‘Positive Outcome’ and ‘Recovery’ from the Protocol.
Jenkinson, Coulter and Wright, analysed the, Short form 36 health survey questionnaire: normative data for adults of working age, in the British Medical Journal. They reported that the mean SF36PF score for those reporting long standing illness was 78.3. For ‘Respondents not reporting long standing illness’ the mean score was 92.5. This suggests that the thresholds defined and published in the PACE Trial Protocol were realistic targets for an effective treatment.
THE PRINCIPLES OF GOOD CLINICAL PRACTICE FOR MRC-FUNDED TRIALS states:
“2.5 Clinical trials should be scientifically sound and described in a clear detailed protocol.
“2.6 A trial should be conducted in compliance with the protocol that has received prior Ethical Committee favourable opinion.”
Changing the Primary Outcome Measures
In 2007 Evans explored When and How Can Endpoints Be Changed after Initiation of a Randomized Clinical Trial. He states: “A fundamental principle in the design of randomized trials involves setting out in advance the endpoints that will be assessed in the trial , as failure to prespecify endpoints can introduce bias into a trial and creates opportunities for manipulation.”
Clinical Trials with poor treatment results might be a disappointment and even an embarrassment to researchers, but the results can still be important to science and medicine and should be published. The International Committee of Medical Journal Editors (as referenced in MRC Guidelines) states:
“Obligation to publish negative studies. Editors should seriously consider for publication any carefully done study of an important question, relevant to their readers, whether the results for the primary or any additional outcome are statistically significant. Failure to submit or publish findings because of lack of statistical significance is an important cause of publication bias.”
The PACE Trial Authors’ Explanation
The PACE Trial website FAQ2 (frequently asked questions) includes the following, published after the Lancet publication:
“27. Why did you change the analysis plan of the primary outcomes?
“A detailed statistical analysis plan was written, mainly by the trial statisticians, and approved by the independent Trial Steering Committee before examining the trial outcome data. This is common practice in clinical trials. We made two changes: First, as part of detailed discussions which took place whilst writing the statistical analysis plan, we decided that the originally chosen composite (two-fold) outcomes (both % change and the proportions meeting a threshold) would be hard to interpret, and did not answer our main questions regarding comparative efficacy. We therefore changed the analysis to comparing the actual scores. Second, we changed the scoring of one primary outcome measures – the Chalder fatigue questionnaire – from a binary (0, 1) score to a Likert score (0, 1, 2, 3) to improve the sensitivity to change of this scale. These changes were approved by the independent Trial Steering Committee, which included patient representatives.”
The differences between all 4 treatment arms remains proportionate whatever analysis is used. If the researchers were bent on answering “questions regarding comparative efficacy” there was nothing to stop them from producing as many comparisons between the treatments as they could wish, without discarding the Primary Outcome Measures and creating anomalies in the process.
Replacing the ‘composite outcomes’ because they would be ‘hard to interpret’ does not explain why at the same time, the target thresholds were drastically reduced and ‘Recovery’ criteria were eliminated entirely. Neither does it explain the need for a construct misleadingly called ‘Normal Range’, which was partly responsible for misrepresentation of the research in the worldwide media.
In Protocol version 5, 2006 amendments made to the Protocol by the Investigators including review of the Primary Outcome Measures. The Protocol states:
“• .…We have made some minor changes to the protocol with the addition of measures in order to: properly measure meaningful outcome…” (p.79)
“• A modification to the primary outcome, by the addition of a 50% reduction in fatigue and physical disability being a positive outcome, alongside the previously approved categorical outcome.
- An additional primary outcome of needing both fatigue and physical disability to improve
- Operationalised criteria for recovery” (p.80)
This shows that the ‘composite measures’ that the investigators claimed ‘would be hard to interpret’, had been specifically included, “in order to: properly measure meaningful outcome”.
Furthermore, Protocol v2.1 stated: “We propose that a clinically important difference would be between 2 and 3 times the improvement rate of SSMC”. This estimate followed a paragraph detailing analysis of previous research literature and was an authoritative projection of the results that the PACE Investigators expected. In fact the ‘improvement rate’ of GET and CBT reported in the Lancet was not 2 or 3 times that of SSMC, but 1.3 times; and was only detectable due to the construction of ‘normal range’.
Page 21 of the Protocol states: “The two primary outcomes of self-rated fatigue and self-rated impairment of physical function will allow us to assess any differential effect of each treatment on fatigue and on function.”
Therefore Protocol v2.1 made specific provision to measure meaningful outcomes and differential effects of the treatments.
The investigator’s explanation in the FAQ shows that changes were made for the purpose of inflating results that were undetectable with the published Primary Outcome Measures because there were no significant differences. The justification for changing the Protocol is that they wanted to answer “questions regarding comparative efficacy” even though the Primary Outcome Measures which they discarded, had been designed for that exact purpose.
The PACE Trial Protocol states: “There is therefore an urgent need to: (a) compare the supplementary therapies of both CBT and GET with both APT and standardised specialist medical care (SSMC) alone, seeking evidence of both benefit and harm (b) compare supplementary APT against SSMC alone and (c) compare the supplementary therapies of APT, CBT and GET in order to clarify differential predictors and mechanisms of change.”
This could not be clearer. The trial was designed to compare the treatments with each other and the comparison group. Which raises the question:
If the trial was designed specifically to allow comparisons between the treatments, what made the researchers think that they would have to lower the target thresholds in order to “answer our main questions regarding comparative efficacy”?
It is as if they knew that the treatments had fallen below the thresholds set in the Protocol. If true, this would mean that alterations to the Primary Outcome Measures were not based on any rational target of ‘recovery’ or ‘positive outcome’, but were chosen to produce a ‘treatment effect’.
When the Protocol was designed, the researchers analysed previous research, including their own, into GET and CBT for ME/CFS. This informed their choice of the Outcome Measures.
The Protocol states: “We have chosen 15 sessions for all supplementary treatments on the basis of the previous trials of CBT and GET [18,23-26], as well as extensive clinical experience.” And: “The existing evidence does not allow precise estimates of improvement with the trial treatments. However the available data suggests that at one year follow up, 50 to 63% of participants with CFS/ME had a positive outcome, by intention to treat, in the three RCTs of rehabilitative CBT [18,25,26], with 69% improved after an educational rehabilitation that closely resembled CBT . This compares to 18 and 63% improved in the two RCTs of GET [23,24], and 47% improvement in a clinical audit of GET .”
Therefore the Primary Outcome Measures in the Protocol were based on existing research, including that done by the Principal Investigators themselves and on their “extensive clinical experience”. Why would these experienced physicians and researchers believe it was necessary to lower their own target thresholds by 30% to 50% in order to detect a treatment effect? How did they know that the treatments had not reached the Protocol thresholds that they had so authoritatively defined?
Changing the Primary Outcome Measures in order to exaggerate the small differences between the treatment arms, created the appearance of treatment effects that had been below the threshold of significance. It also created disturbing anomalies.
In response to an FOI to Queen Mary, University of London, inquiring: “…how many participants were in the normal range in either the SF-36 or Chalder Fatigue Scale at the beginning of the trial, ie one or the other?” Paul Smallcombe, Records & Information Compliance Manager, on Feb 12th 2013, provided data which shows that 85 (13%) of participants were within ‘normal range’ on at least one measure before treatment. 3 participants met ‘normal range’ for both measures and according to The Lancet, were ‘recovered’ before they even started treatment.
This must be the first time in the history of medical research that it was theoretically possible for participants with a severe and disabling illness to receive treatment in a Clinical Trial, deteriorate, yet be declared successfully treated having reached ‘normal range’ and be represented in the Lancet as ‘recovered’.
No less than 78 participants (12.19%) were ‘normal range’ for the SF36PF at BASELINE. These were not valid participants because according to White et al and the Lancet, they did not have the disease under investigation. They were ‘normal’ and as such skewed the data.
Ben Goldacre remarked in The Guardian, “…in a trial, you might measure many things but you have to say which is the “primary outcome” before you start: you can’t change your mind about what you’re counting as your main outcome after you’ve finished and the results are in. It’s not just dodgy, it also messes with the statistics.” Goldacre added, “You cannot change the rules after the game has started. You cannot even be seen to do that.”
The Declaration of Helsinki states: “30. Authors, editors and publishers all have ethical obligations with regard to the publication of the results of research. Authors have a duty to make publicly available the results of their research on human subjects and are accountable for the completeness and accuracy of their reports. They should adhere to accepted guidelines for ethical reporting. Negative and inconclusive as well as positive results should be published or otherwise made publicly available….”
The World Health Organization (2004), A Practical Guide for Health Researchers states in, Writing the research protocol:
“… once a protocol for the study has been developed and approved, and the study has started and progressed, it should be adhered to strictly and should not be changed. This is particularly important in multi-centre studies. Violations of the protocol can discredit the whole study…”
The Research Councils UK, Policy and Code of Conduct on the Governance of Good Research Conduct, states:
“All research should be conducted to the highest levels of integrity, including appropriate research design and frameworks, to ensure that findings are robust and defensible.
“This code therefore concentrates on entirely unacceptable types of research
conduct. Individuals involved in research must not commit any of the acts of
research misconduct specified in this code.”
This includes the inappropriate manipulation and/or selection of data, imagery and/or consents.”
misrepresentation of data, for example suppression of relevant findings
and/or data, or knowingly, recklessly or by gross negligence, presenting a flawed interpretation of data;”
By happy chance or design, it seems that the threshold chosen for the SF36PF of ‘Normal Range’ was positioned in the exact spot – a score of 60, to make just enough participants ‘normal range’ for the PACE Trial to be able to claim a treatment effect. Had the threshold been set at 65, it appears that a substantial number of those currently classed ‘normal range’ would have been eliminated. Had the threshold been set at 55, it would have included substantially more of the Comparison Group (SMC, purple on the chart below). It further appears, that the authoritatively defined Protocol threshold of 75 for ‘Positive Outcome’, might have included only a few participants. It would be surprising if more than a handful reached the Protocol ‘Recovery’ threshold.
Please note that the researchers stated, “We propose that a clinically important difference would be between 2 and 3 times the improvement rate of SSMC”. If the researchers’ expected outcomes for their favoured treatments had been reached; the average scores of GET and CBT at 52 weeks should have been somewhere between 65 and 80. Therefore there was no “clinically important difference” between the comparison group and treatments.
Results as published in the Lancet with the various thresholds added in orange.
In view of the above, one might take the view that the PACE investigator’s complaint that the original Protocol Primary Outcome Measures, “did not answer our main questions regarding comparative efficacy”, has some validity. It does not. The key word is: ‘efficacy’. Based on their own research and “extensive clinical experience”, the researchers set the Primary Outcome thresholds to detect “clinically important differences”. To that end they defined ‘Positive Outcome’ and ‘Recovery’ in order to measure ‘efficacy’. Failure of the treatments to reach those authoritative thresholds showed the absence of efficacy, making comparisons meaningless.
The PACE Protocol states: “Methods/Design Aims
“The main aim of this trial is to provide high quality evidence to inform choices made by patients, patient organisations, health services and health professionals about the relative benefits, cost-effectiveness, and cost-utility, as well as adverse effects, of the most widely advocated treatments for CFS/ME.”
Had the research followed the Protocol in accordance with established scientific standards for medical research, the results produced would be of considerable interest to medical professionals, their patients and the public. Unfortunately, the data has been withheld from them. Information which could affect the treatment choices and medical support of participants and patients alike will probably remain hidden, until the research data is analysed according to the Research Protocol’s authoritative and rational measures of “Positive Outcome” and “Recovery”.
Below is a table showing varied interpretations of SF36PF scores
(White was the PACE Trial chief investigator, Bleijenberg and Knoop published a comment on the PACE Trial accompanying the PACE report in the Lancet, Wearden was chief investigator of the FINE Trial (a ‘sister study’ of the PACE Trial), Reeves was the head of the CFS department at the CDC USA)
The threshold of Normal Range is not ‘Recovered’ from ME/CFS by any objective interpretation. For some participants Normal Range may have represented some improvement, but unless they far exceeded the threshold the evidence suggests that they remained severely ill and restricted in all activities.
Illustration: how the ‘Normal Range’ construct threshold compares with the SF36PF averages of some common disabling conditions. Critical evaluation of ‘Normal Range’ shows that it is clinically meaningless
The PACE Trial construct of ‘Normal Range’ is clinically meaningless and misleading.
Some reference hyperlinks may be out of date but should be easily recoverable by searching Google with quoted phrases inside speech marks. E.g. “research study involving human subjects must be clearly described” will direct you to a copy of the Declaration of Helsinki.
 The Lancet, PACE Trial report. Volume 377, No. 9768, p823–836, 5 March 2011 http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(11)60096-2/fulltext
 BMC Neurology 2007, 7:6. doi:10.1186/1471-2377-7-6
 BMJ. 1993 May 29; 306(6890): 1437–1440. PMCID: PMC1677870
 Evans S (2007) When and How Can Endpoints Be Changed after Initiation of a Randomized Clinical Trial. PLOS Clin Trial 2(4): e18. doi:10.1371/journal.pctr.0020018. Online: http://clinicaltrials.ploshubs.org/article/info:doi/10.1371/journal.pctr.0020018
subref 1. Online: http://www.ich.org/LOB/media/MEDIA485.pdf
 Ben Goldacre. Clinical trials and playing by the rules. The Guardian, Saturday January 5 2008