Randomised controlled trials are the gold standard of clinical research. This section will cover all that is required of a good randomised controlled trial (RCT) but also meta-analysis. It has the following sections:
- Why we Need Randomised Controlled Trials
- The Placebo Effect
- The Hawthorne Effect
- Analysis by Intention to Treat
- The Power of a Trial
- Getting a Drug to Market
- Planning a Trial
- Getting Published
- After the Trial
- Outside of Medicine– a suggestion of a trial someone might like to try
- Further Resources
- Site Index
If you wish to go directly to a section, click on the blue underlined title.
Randomised controlled trials or RCTs ask the basic question, “Does it really work?” They are largely a British invention dating back to the 1940s. The basic principles of randomised controlled trials or RCTs are fairly simple but the reality of a good trial is much more complex as there are so many factors that may distort results. They have to be taken into account. We also look at meta-analysis, its power and its limitations.
Why We Need Randomised Controlled Trials
Medical research needs basic science such as cell biology and biochemistry. However, most medical research is not conducted in laboratories but on people and it falls into one of three categories:
- Randomised controlled trials or RCTs
- Cohort studies, also called longitudinal studies
- Qualitative research
The first two are called quantitative research as they involve numbers but the last is for matters that do not lend themselves to numerical analysis and is called qualitative research . These three types of research represent most of the research that appears in the general medical journals.
Another type of paper that may be found in medical journals is case reports. They are reports of one or several patients and they may illustrate interesting points. They may even point to future research. However, they are not research, and this must not be confused. A notable example of case reports is the infamous paper by Andrew Wakefield and colleagues about the MMR vaccine, suggesting a link to autism and inflammatory bowel disease. This was simply a report of a dozen cases. It was not research. It was dishonest in that it did not declare that many of the subjects had been referred with a view to suing the manufacturers. In addition, the relationship in time between the vaccine and the onset of symptoms had been fraudulently shortened to achieve impact. It has since been retracted by all authors except for Andrew Wakefield. There is much more about this in the section on Fake News and Vaccine Scares.
Years ago, a doctor may have treated say 25 patients with arsenic for malaria and six of them recovered. He would produce a series of six cases and no one would know about the 19 failures although they represent more than 75% of the total. Even if he did include the 19 who died, as there is no control group the reader does not know if these six survived because of the arsenic, regardless of the arsenic or despite the arsenic.
The basic principles of randomised controlled trials or RCTs are fairly simple but the reality of a good trial is much more complex as there are so many factors that may distort results. They must be taken into account.
Suppose that you have just invented a new antibiotic or an anti-cancer drug, or perhaps you have a traditional herbal remedy. You want to know if it works. You could give it to someone and see if that person gets better, but that single case does not really tell you much at all. Suppose that you have a cold and you take penicillin. You get better and decide that penicillin cures the common cold. This is wrong as you would have got better just as fast without the penicillin. Suppose that someone had cancer and you gave him your new anticancer drug but he died. Does that mean that it is useless? Perhaps he was one of the 30% who would die if given the drug compared with 70% who would die without it. Isolated cases mean nothing.
The basis of a controlled trial is that a number of people are split into two groups. One group receives the treatment and the other does not. The trial group may receive a drug or an intervention such as radiotherapy or acupuncture. Does the group with the drug or intervention do better than the other group? It is very rare for all of one group do well whilst all of the other group do badly. Therefore we use statistics to decide if one group did significantly better than the other or if the difference could well have occurred by chance.
If you take a coin and toss it 10 times, how many times would you expect it to land heads? It may be 5 times but you would not be surprised if it was 4 or 6 times. If you toss it 10 times and it lands heads on 7 occasions or just 3 times, that is unexpected but it does not necessarily mean that the coin is loaded. It could well occur by chance. If you toss it 100 times and it lands heads 70 times, that is much more significant than 7 times out of 10. It represents 70% rather than the “expected” 50% in both cases but the larger number makes the variation from the “expected” more significant.
I have used quotation marks for “expected” as it is not necessarily what you would expect. If you toss it 10,000 times and it lands heads 7,000 times it is almost certainly loaded. The more times it has been tossed the more significant this deviation becomes. This illustrates the importance of sample size or the number of people in a trial. Small trials may be misleading. The results are more likely to occur by chance.
You could even toss an unloaded coin 10 times and it comes down heads every single time. However, the chance of this happening is 1 in 210 which means 1 in 2 multiplied by itself 10 times or 1 in 1024. This is very unlikely but not impossible. I may win the jackpot in next weekend’s National Lottery by picking all the correct numbers but the chance of my numbers coming up is about 1 in 14 million. However, most weekends someone somewhere wins. Next weekend it could be me. Actually it will not be me as I do not do the lottery as I regard the odds as far too heavily stacked against me. Remember that since the pupils of Plato we have accepted that we cannot be absolutely certain about anything. A 1 in 1,000 chance is unlikely but not impossible. A 1 in 14 million chance is extremely unlikely but it could be you. Science, especially medicine, is not about certainties but probabilities.
Clinical medicine is about assessing the patient and balancing benefits and risks. Sometimes the doctor will get it wrong and will harm rather than help the patient. That does not necessarily mean that his decision, based on the best evidence available at the time, was fundamentally wrong.
In a trial, suppose that the treatment group had success in 7 out of 10 but the control group had success in 6 out of 10. Is the treatment beneficial? If one of the positive results from the treatment group had instead been in the control group and one of the failure people in the control group had been in the treatment group, the results would have been reversed. It may be better. It may not. The result is too close to call. On the other hand, if the figures were 700 out of 1,000 versus 600 out of 1,000, the result would be much more impressive.
So how do we prove our theory or hypothesis? We do not. We try instead to disprove. Science is a quest for the truth, not an attempt to justify a predetermined position.
Karl Popper said that a scientific theory that cannot be disproved is of no value. We do not try to prove our theory but to disprove it. We take the null hypothesis. This states that there is no difference between the two groups and this is what we put to the test. We may want the null hypothesis to fail to show that our treatment works, but that is how we start. We compare the results of the two groups and do a statistical analysis to show the chance that those results would be achieved if the two groups were really the same. The details of the statistics are not necessary, only the concept. The mathematics of statistics can be very complex.
If the statistics show that there is a strong possibility that these findings could have occurred by chance if there is no difference between the groups, we regard the value of the treatment as unproven. The null hypothesis has not been disproved. However, if the statistics show that it is very unlikely that these results would occur by chance if both groups were similar, we regard the result as positive. The null hypothesis has been disproved. The treatment works.
People sometimes say, “Statistics prove…” but statistics are unable to prove anything. They merely indicate probability. There will be much more about statistics in Basic Maths in Medical Research and Decision Making but it is still kept fairly basic and simple.
The next question is what level of probability or chance do we accept as statistically significant?” At what level do we reject the null hypothesis? The honest answer is that this is quite arbitrary. For fairly small trials this is often set as a 5% probability or a 1 in 20 chance. In other words, there is a less than 5% or 1 in 20 chance that these results occurred at random. This is usually written as P=0.05 or if the chance is less than 5% as P<0.05. P means probability. For larger trials it may be possible to show that P<0.01 which means that the chance of this being a random finding is less than 1% or even P>0.001 means that the chance is less than 1 in 1,000. It is not impossible that this result was achieved by chance, but it is very unlikely.
It is purely custom and practice that sets the level of probability that is regarded as statistically significant. Remember that there is no such thing as absolute certainty. Reviews and systematic reviews show that policy is not based on a single trial. Thay look at a number of large and well conducted trials. If all or nearly all show a low probability that the intervention does not work, we take it as proven. It does work.
The first ever controlled trial may be attributed to James Lind in 1747 when he took 12 sailors with scurvy and put them into six groups of two to test various possible treatments. However, groups of two are unreasonably small. He was lucky that the result was so spectacularly impressive. The first proper controlled trials are usually considered to be from the late 1940s. There was a Dutch trial of paludrine for malaria and this was successful. The British Medical Research Council (MRC) did a memorable trial of streptomycin in tuberculosis of the lungs in 1948. 1The First Randomised Controlled Trial This trial required about 1,000 patients to show the benefits of streptomycin. However, four years earlier, in 1944, the MRC did a controlled trial of patulin, a little known antibiotic, on the course of common colds. 2The first properly controlled multicentre trial It was useless. We now expect antibiotics to be ineffective against viruses. The streptomycin and paludrine trials are remembered but the patulin trial is not. This shows that positive results are much more memorable than negative ones.
The Placebo Effect
Since the 1940s, controlled trials have become much more sophisticated although the basic principles apply. A major problem is giving one group something whilst giving the other group nothing. Just being given something can have a profound psychological effect and this affects outcome. This is easy to understand if the drug being tested is for pain or for a psychological problem such as depression or anxiety, but this is important across all types of conditions. Therefore it is necessary to give both groups something and neither group should know if they are receiving the active or the dummy drug. The dummy drug is called a placebo from the Latin for “I shall please”.3Online etymological dictionary
It should resemble the active drug as closely as possible. It should be a similar size, shape and colour. Some drugs have a distinctive taste. Quinine is bitter and tonic water was devised to make it palatable. It may be possible to circumvent this problem by giving it in a capsule that is swallowed whole. The usual placebo is the milk sugar lactose in tablet or capsule form. The placebo effect of capsules seems to be stronger than that of tablets and the colours are important. Therefore both the test drug and the placebo must be presented in identical form.
When side-effects are characteristic, it may be difficult to disguise the identity of the drug. For example, in a cancer trial, prednisolone is a steroid that produces the characteristic moon face of Cushing’s syndrome. The cancer drug cis-platinum causes considerable nausea and sickness whilst cyclophosphamide readily causes hair loss. Nevertheless, as far as possible the patient should not know to which group he has been allocated. This also applies to those who look after the patient and those who assess progress. Just knowing that the patient is in one group or another may subconsciously affect the way that he is treated or the assessment made.
People want their new treatment to work. The placebo effect is potent and important. An excellent and clear review from the American Academy of Neurology is written in simple English without technical words. 4The placebo effect
The more we learn about the placebo effect the more potent it appears to be. Even patients in the placebo group may have to withdraw from a trial because of intolerable side effects. If the patient does not know which group he is in, this is called blind or single blind. If neither the patient nor those who treat him and assess him know which group he is in, this is called double blind. Double blind randomised controlled trials are regarded as the gold standard of assessment for a treatment.
Neither the subject nor the
researcher should know which group the person is in
From a purely practical perspective, some have asked, “If something works but as a placebo, why not give it if it gets results?” The ethics of placebo treatment are complex 5Role of placebo in clinical practice but the strength of the effect does depend upon the person and the circumstances.
Since around 1950, blinding has been seen as an essential technique for RCTs. This means that people should not know which group they are in. However, a paper in the BMJ in January 2020 cast doubt on this revered principle. 6 Impact of blinding on estimated treatment effects in randomised clinical trials: meta-epidemiological study They performed a meta-analysis of 142 Cochrane meta-analyses, incorporating 1,153 randomised trials. They conclude that there is no evidence that a lack of blinding leads to exaggerated estimates of treatment effects.
In another paper in the same edition, the authors argue that blinding may hamper recruitment and retention of participants. 7 Fool’s gold? Why blinded trials are not always best It can also compromise patients’ safety and render the evidence less applicable to real life care. They prefer open label but with adequate randomisation, allocation concealment, and blinded objective outcome assessment.
That is not to suggest that we should dump blinding but omitting it may permit more randomised trials in areas of healthcare where trials have been deemed hard to perform. Getting an adequate placebo for surgery is obviously a serious problem.
There may already be a treatment for the condition and so it would be unethical to withhold effective treatment and give a placebo. 8Ethics and placebo When this happens it is necessary to compare the new treatment with the current treatment. This overcomes the ethical problem of failing to treat a treatable condition. In addition, the important question is not, “Is this new treatment better than nothing?” but “How does this new treatment compare with standard therapy?” Again the placebo effect must be considered and, if possible, the patient should not know which treatment he receives. We may need to look at not only effectiveness, but how it compares with other drugs for adverse effects.
Practitioners of complementary and alternative medicine such as homeopathy and herbal remedies often argue that their discipline cannot be tested with placebo-controlled trials as their remedies are individually concocted for the specific problems of the patient. This is untrue. They can go through their usual routine and produce their usual medicine and then a third party either gives that to the patient or gives him something that is indistinguishable. In the case of homeopathy this is either the highly diluted remedy or water straight from the tap.
Physical therapies are more difficult. Physiotherapists often use equipment to provide short wave therapy or ultrasound. Techniques that have been used include going through the motions of using the equipment but it is turned off or de-tuned so that it does not work. This might mean that the therapist is also unaware if the patient is receiving the treatment or control. Massage is far more difficult to design a placebo, as are exercises and education. There have been some cunning plans for acupuncture involving placing the needles in the wrong position of making insertion too shallow. Manipulation as in osteopathy or chiropractic is also difficult to produce a sham treatment. There are other problems that will be discussed when Acupuncture and Manipulation of the Spine are discussed.
The placebo effect is very strong and very important. It works in all fields, not just those where a large psychological component is to be expected. The placebo effect is one reason why bogus therapies seem to work.9Why bogus therapies seem to work People may expect bad side-effects and this is called the nocebo effect. It works in the same way as placebo.10The Biochemical and Neuroendocrine Bases of the Hyperalgesic Nocebo Effect.
Surgery is also a strong placebo. Strictly speaking, to compare surgical interventions, one group should have the full operation whilst the other group should be opened up, removal of the gall bladder or whatever is being studied is not performed, and the wound is closed. Afterwards the patient is not told whether or not the operation was completed. However, this would cause enormous problems both with regard to volunteers and getting such a protocol through an ethics committee. Ethics, including informed consent and ethics committees will be discussed later.
Not all faith healers are frauds in that they do genuinely get people better although some are circus performers who have stooges who get “miraculously cured” at each place where they stop. Even Jesus needed faith and sometimes He would say, “Your faith has cured you.”11(Gospel according to Luke. Chapter 18 verse 42) When He returned to His own home town of Nazareth His success rate fell dramatically as everyone knew Him as the local carpenter and they knew His mother, brothers and sisters who lived among them.12(Gospel according to Mark. Chapter 6 verses 1-6.)
Parkinson’s disease (PD) is caused by an imbalance between the nerve transmitters dopamine and acetyl choline in the brain. This is why L-dopa is given to boost the dopamine levels in the basal ganglia of the brain. A trial has shown that those given placebo have “a substantial release of endogenous dopamine”. 13Mechanism of placebo effect in Parkinson’s disease Endogenous means naturally occurring from within. In other words, the brain starts to produce its own dopamine in response to the placebo. The placebo response is very real and has a physical explanation.
Another problem that is related to the placebo effect, is random allocation of a patient to a group without making it obvious which one he is in and so negating the placebo control. In the early streptomycin trial for tuberculosis, patients were allocated alternately to the treatment or the control group as they were recruited. A number of other techniques have been suggested such as whether the person’s birthday or hospital number is an odd or even number. The trouble is that the person who recruits the subject will know which group he will be in before he recruits him. This may affect his decision whether or not to recruit or he may give subconscious signals to the subject that let him know his group.
The technique used most often today is based on random numbers. Lists of random numbers can be obtained. It is possible to take each number as it appears on the list and put each number in an opaque envelope and seal it so that no one knows which number is inside. Only after the person has been recruited the sealed envelope is opened and this can indicate the group. For example if it is an odd number they may be in the active group and if it is even the control group.
Neither the subject nor anyone who will treat or assess him knows which group he is in. The envelope may be opened by a pharmacist who then gives the subject his medication with a code on it and only at the end of the trial is the code broken to show which group he was in.
This technique can also be used if there are more than two groups to be studied. These techniques may mean that there are uneven numbers in each group. This is not a problem provided that there are adequate numbers in both groups. This may seem very “cloak and dagger” but preserving the blind or double blind principle is extremely important. Do not underestimate the power of placebo in any field.14Introduction to placebo effects in medicine
It would seem logical to believe that the placebo effect can only work if the recipient believes that it is an active product. Therefore, I was astounded when I came across a review which showed that even open label placebo can sometimes be of value.15Open label placebo: can honestly prescribed placebos evoke meaningful therapeutic benefits? “Open label” means that the person knows what he or she is getting and in this case it is placebo. This paper can be accessed in full, without charge. It seems that they are most likely to be useful in conditions with a large subjective component such as pain, anxiety or nausea but they will not shrink tumours or lower cholesterol. Nevertheless, it is fascinating to find that sometimes, honestly giving a placebo, telling the patient what it is, can work.
The Hawthorne Effect
The Hawthorne effect is named after a business efficiency study at the Hawthorne Works of the Western Electric Company in Chicago in 1924. The workers were divided into two groups. One group was asked to work in a different way as part of a time and motion study. The other group was asked to continue as before. Productivity improved in both groups. 16Systematic Review of the Hawthorne Effect
The Hawthorne, General Electric factory. Most are wearing white shirts and ties to work on a production line in a factory.
The conclusion was that when a group of people know that they are being observed this will change their behaviour. Some authors deny that the Hawthorne effect exists and claim that at best it is a variant of the placebo effect. However, a person in a group who knows that they are being observed will behave differently.
Perhaps there is a trial of education and awareness to ask if it leads to a healthier lifestyle. Even if you were in the control group, you would be more conscious of a healthy lifestyle and so this knowledge of being observed would have an effect on behaviour. If the intervention group know that they are being observed whilst the control group do not, any benefit may be from the Hawthorne effect rather than the intervention. The same applies to “before and after” studies where knowing one is being observed may have made the difference to the “after” or to have contributed to it.
Analysis by Intention to Treat
A patient may be allocated to a group but is unable to complete the course. Sometimes cancer patients are too ill for the treatment. Sometimes side effects or other problems cause the patient to leave the trial. How are such cases managed when the results are analysed? Some people have put those who cannot take the active treatment into the control group but this is unfair. If someone is too ill to receive a cancer treatment and he is transferred to the control group, this loads very sick patients into that group. The groups are not similar and the trial is invalid. This has happened remarkably often in the past.
An example of this type of error was a trial to show if education reduced teenage pregnancy and for a while this was held up by the Department of Health as an example of its effectiveness. Teenagers were divided into two groups. One group was offered a package of education. The other group was not. However, those who were offered the education but refused it were transferred to the control group. This assumes that there is no difference between those who accepted and those who rejected the education. This is not a valid assumption.
How then should we manage those who for any reason fail to start or to complete the course? They may have dropped out due to side effects. The answer lies in the words “analysis by intention to treat”. This means that when the statistics are analysed, that subjects are kept in their original group whether or not the treatment was completed. The ultimate question that is being asked is, “Should we offer this treatment to patients with this condition?” not, “Does it work in those who manage to complete the course?”
The Power of a Trial
A common problem from the 1960s and 1970s and even into the 1980s was that the results of trials seemed promising but they failed to achieve the required level of statistical significance. The numbers were too small. They are said to be “under-powered”. If a number of such trials are put together a meta-analysis may produce a result but as will be explained later, there are problems with meta-analysis.
Nowadays this happens less often because it is possible to decide how large a sample (number of participants) is required before setting out. If that number if too large to be achieved from a single centre, then multicentre trials or even multinational trials are conducted. There are examples where countries across Europe or across the Atlantic have participated in a single trial.
Getting a Drug to Market
Getting a drug on the market is a very long and arduous process. Of the chemicals that are examined, very few reach the stage of trials and many of these fall by the wayside either because of lack of efficacy or because of toxicity. It is also a very expensive process, bearing in mind how many of the drugs that are initially developed do not even reach the market.
After animal testing comes phase 1 trials. These involve giving the drug to perhaps a dozen healthy volunteers. A couple may receive placebo but others will receive the drug at various doses. The aim is to make sure that there is no undue toxicity. There was a serious problem with a drug known as TGN1412 in a test at Northwick Park Hospital in London in March 2006. This will be discussed further in the chapter on Ethics in Practice and Research.
After phase 1 trials come phase 2 and phase 3. These are trials involving patients who have the condition to be treated. Phase 2 trials are fairly small but if the results are promising phase 3 trials that are rather larger follow. There will probably be a total of about 3,000 patients in most data bases by the time that the results are submitted for approval to get the drug licensed.
Planning a Trial
The basic planning is very important. It is unacceptable to keep moving the goal posts as the trial progresses to try to get a positive result. First the researchers have to decide what they are going to measure. This may be the number of people cured or still alive after a certain time. It may be level of pain and there are a number of pain charts or “pain thermometers” that have been validated for use. Results in both groups can be expressed as a percentage.
Then the researchers have to decide what represents a clinically significant improvement. A very large trial may show that a tiny improvement is statistically significant but if it is tiny it may not be clinically significant. What is significant is a subjective decision. It might be the difference between 20% and 30%. It may be the difference between 60% and 80%. They can then go to a series of tables and find the correct entry for their proposed difference that they hope to achieve. They can then look down the table and see how many subjects they need in each group to have an 80% chance of showing a statistical difference at the 5% level if that difference exists. This still gives a 20% chance of failing to show that difference, even if it does exist.
Another column will show the same, but the sample size to have a 90% chance of showing a 5% significance. This will be a larger number, but it still leaves a 10% chance of failing to find a significance that does exist. In deciding what they regard as clinical significance the researchers will be aware that a larger difference will require fewer subjects to be recruited but that difference is less likely to be achieved. A smaller difference will be more likely to be achieved but it will require more subjects.
It is important that the design of the trial is made clear at the outset, in the planning stages, and this is adhered to.
- State the null hypothesis. This states that there is no difference between the two groups. You are probably hoping to disprove it.
- Criteria for inclusion and exclusion
- What is to be measured
- How it is to be measured
- Number of subjects to be recruited
- How they are to be allocated to a group and how that group is kept secret
- Nature of control group. It may be placebo or current best treatment
- How the results will be analysed
All this is important from the start, especially in multicentre trials. In cancer trials the type of cancer is defined and also the stage to be treated. Most cancers are classified by a number of stages according to how advanced they are. It is invalid to compare a treatment with cancers in a more advanced stage in one group than another.
Before starting it is necessary to get approval from an ethics committee. This will be discussed much further in the chapter on Ethics in practice and research. It used to be an optional extra but now it is compulsory in most countries. The criteria set out above should not be changed during the trial without the committee’s approval. It costs money to run trials and funding will not be forthcoming unless ethics approval has been met. The trial also should be registered. This includes registering all the bullet points above so that it should not be possible to start looking at other parameters or doing subgroup analysis to find somewhere with a positive result after the trial has started or finished.
It is important to get the groups as similar as possible but without massaging subjects from one group to the other. Strict criteria for entry are important, and this is really how to get uniformity. When a painkiller is being tested it is best to be treating the same cause of pain in each person. Pain after removal of wisdom teeth is a common scenario. Sometimes it is possible to get patients to act as their own controls. Perhaps it is a trial of pain relief for a chronic condition such as osteoarthritis.
Chronic means long lasting, from the Greek word kronos meaning time. It does not mean severe. Acute means of brief duration and subacute is between the two. If A is the study treatment and B the control, then over a period of perhaps four months the subjects may be allocated to take medication a month at a time in the order ABBA or BAAB without knowing which is which.
Exclusion criteria for a trial usually include having other diseases and having to take other medication. For this practical reason, most subjects in trials of drugs for heart failure have been in their 50s or 60s. However, when a doctor sees a patient with heart failure, that person is usually around 85 with other conditions such as osteoarthritis and taking multiple medication. Therefore the doctor may ask himself, “Is the evidence available really applicable to the patient in front of me?” The answer is that it is not completely satisfactory but it is the best evidence available. Nevertheless a certain amount of common sense or intuition may be required in terms of doses given and how fast they are stepped up.
There is a tendency to find that subjects in trials are young, male and white. Can the findings be translated to those who do not fall into that category? It may simply be the best evidence that we have. Race can be important. Recommendations for treating hypertension (high blood pressure) in Afro-Caribbean people are different from others. The criteria for preventing diabetes and heart disease are stricter for those whose racial origin is from the Indian subcontinent. Race, gender and age do matter but in few fields is this fully explored.
About 90% of drug research is funded by the pharmaceutical industry and whilst it would be wrong to regard all such research as tainted, they can sometimes take liberties to put it mildly. Contracts that give the company too much say in what, if anything gets published are a major problem. Selective publication of only flattering results can happen and it distorts the picture. Sometimes new drugs are not compared with the market leader or best currently available therapy but an old fashioned drug with less efficacy and more side-effects than alternatives. They might be compared with the market leader but at a lower dose than would normally be used. They may even compare with a higher dose than usual to get flattering results with regard to side-effect. All this is dishonest.
After all the effort to conduct a trial and after patients have agreed to cooperate, it is important to try to get the research published. This is not simply an ego trip on behalf of those who wish to see their names in print. It is an important way of disseminating information that will help to improve the management of patients. Therefore, research should be written up fairly and accurately and submitted to journals for publication as soon as is reasonable. The WHO would like all research to be submitted within 18 months of completion.
If results fail to show the hoped for significance in a trial, there is a temptation to look further at the figures. Are they significant if you look at just the males or just the females? If you split them into age groups, does any group do better? If there are plans to do this it must be clear at the outset and each group must be adequately powered as outlined above. This means adequately large. Otherwise the researchers are looking at many possible variations to see if something can be salvaged to give a positive result. If you look at twenty possibilities, one is likely to come up as significant at the 5% level because the 5% level means 1 in 20. This is cheating and it is not valid.
A professor of cardiology did a sub-group analysis of the ISIS-2 trial for the management of patients who have had heart attacks. He concluded that those whose star signs were Gemini or Libra were more likely to have an adverse outcome whilst those born under other star signs had a good chance of a positive outcome. This was not a tribute to astrology. This was tongue in cheek to prove a point about the futility of subgroup analysis and it was important.
There is a great temptation not to bother to report a trial that had negative results. This is wrong. It is as important to know what does not work as what does. Perhaps there have been 20 trials on a topic but only eight found in favour of the drug whilst 12 found no benefit. If only the eight favourable ones are published anyone who reviews the literature will get a very biased view. It also distorts meta-analysis as will be discussed later. Many people believe that journals are not interested in negative findings but this is untrue. 17Publication bias in editorial decision making Failure to publish negative results is much more often due to the authors. An American paper from 1992 followed up 737 trials that had been registered. Of 124 that had not been published, only six had been submitted but rejected. 18Factors influencing publication of research
Some researchers have examined papers and found that the criteria and reporting of some trials is at odds with how it was presented to the ethics committee for approval. 19Endpoints in clinical trials. Are they moving the goalposts? This suggests that the goalposts were moved when the results came in. There have been reports that pharmaceutical companies who fund research on their drugs examine the data as it comes in and if the results are looking unduly good the trial may be stopped prematurely to ascertain that the good effect is not lost. Similarly, if the results are looking poor, the trial may be abandoned. This is unacceptable. Nowadays a trial should be registered before it is conducted to ascertain that it is conducted as planned. It may also be abandoned if a bigger, better trail is currently underway. It is also important that the results are reported, whether good or bad for the drug.
There might be a reluctance by the authors to present negative results or reluctance of the editors of journals to publish them. There may also be commercial pressure from pharmaceutical companies who financed the trials to reign in the less flattering ones. In the western world the need for negative results is accepted although not always in practice. In the 1990s someone examined RCTs from China and found that 112 of 113 published in recent years had positive results. Results from Eastern Europe are also almost always positive. However, it would be wrong to assume that the western world does not also suffer from selective publication. A company may fund perhaps 10 studies into the efficacy of a drug, but it chooses to have only the three most favourable ones published. To pick and choose is dishonest as it distorts the literature. It is rather like going through the results of a trial, looking at each subject in turn and saying, “That looks good. We’ll keep those results. That’s rather poor. We’ll dump those results.” Nevertheless there is good evidence that this has happened and not just on the odd occasion. Ben Goldacre in Bad Pharma goes into considerably more detail. (See further reading).
After the Trial
The RCT will answer if a treatment is effective but there are other very important issues to be asked. One is the nature and frequency of adverse effects. This can really only be achieved through vigorous post-marketing surveillance as they may be too rare to show up during a trial and give a true picture. The Committee of the Safety of Medicine (CSM) has been replaced by the Medicines and Healthcare products Regulatory Agency (MHRA). 20MHRA website This is the government agency which is responsible for ensuring that medicines and medical devices work, and are acceptably safe. The MHRA is an executive agency of the Department of Health. The website has a section for public and patients. It is important that adverse effects are reported, especially for new drugs.
Linked to the MRHA is the Joint Committee on Vaccinations and Immunisations (JCVI). They perform a similar function for vaccines.
The thalidomide tragedy 21Thalidomide. Science Museum is one reason why reporting adverse effects it is so important. When thalidomide was introduced, the testing of drugs was totally unsatisfactory. The limb deformities that it caused are normally rare and so they attracted attention. Abnormalities such as hair lip and cleft palate in children of mothers who take some of the older anticonvulsants for epilepsy was more difficult to find as it is a more common malformation. Great care should be taken in prescribing drugs in pregnancy, especially in the early stages. A number of other drugs have been found to produce unwanted effects outside of pregnancy and they have had to be withdrawn after they have been released. There is also a European Medicines Agency (EMA) which takes data from all of Europe. In 2010 it decided to withdraw approval for rosiglitazone, a new drug for diabetes, because of the risk of heart disease. It had been on the market for several years.
I hope that this has given an insight into randomised controlled trials, the gold standard of “does it work” in clinical research. It does not prove anything beyond any doubt. There is no absolute certainty about anything. It just shows that there is a very strong possibility that something does work. In criminal law the prosecution does not have to prove guilt beyond any shadow of a doubt but “beyond reasonable doubt”. In civil law the decision is based on “the balance of probabilities”. This means “more likely than not” or a 51% chance.
The basic idea of RCTs is fairly simple but the correct methodology is very demanding. Bad research is worse than useless. It may mislead and cause patients to be inappropriately treated.
Meta-analysis has already been mentioned. It takes a number of RCTs which may or may not have been adequately powered, and it adds them together to produce, in effect, one massive and statistically highly powered trial.
There are a number of problems with this technique. As mentioned before, not all the trials may have been identical in terms of methodology and end points. There is also the question of selective publication. There is a tendency to publish only positive results, especially if the underlying power of the trial was dubious. It can be difficult to get adequate numbers for rare diseases. It is as important to know what does not work as what does. Furthermore, if only the positive results are being added together for a meta-analysis, this will cause considerable distortion of the end result. It is taking the results that suit you and ignoring the rest. This is called “cherry picking”. The problem is, how do we find if there are many unpublished results?
A number of techniques have been employed to detect unpublished trials and the results are often worrying as failure to publish negative results seems to happen rather often. Some people have approached ethics committees for a full list of trials that have been given approval for a certain type of drug along with the name and address of the lead researcher. They have approached that person to ask for the results, if they were published and where. Most of such studies found that of the trials with positive results, most or all were published. Of the studies with negative results, substantially fewer were published. 22Publication bias This means that there is a built-in bias that may lead reviewers including NICE to recommend a treatment that is nothing like as effective as they think or possibly ineffective and they may be recommending unnecessarily expensive drugs.
Funding for medical research is highly competitive and so an obvious source when testing a drug is the company that produces it. It may also be argued that they should bear the cost of research and development. However, he who pays the piper calls the tune. There is considerable evidence of selective publication at the behest of the pharmaceutical industry where they have funded research. One study that involved sending a questionnaire to 101 lead researchers found no significant difference in the publication rate between positive and negative results. 23Factors influencing the publication of health research However, this was exceptional and what was different about this group is that the trials were funded by the NHS Research and Development Fund and not the industry.
Finding these unpublished results can be very difficult and not everyone who has produced a negative study will comply, especially if they are employed by the organisation that is likely to be exposed. There may even be non-disclosure agreements or “gagging clauses” in a contract. A cunning and devious statistical technique has been developed called funnel-plotting with cut and fill, also called trim and fill. The basic principle is that the larger studies, which are more highly powered from a statistical perspective and hence more reliable, should produce results that are fairly similar. The smaller studies may produce a greater spread but the spread should be even on both sides of the mean (the average) for the whole. If a graph is drawn of the number of subjects in the trial on the Y-axis and the result on the X-axis it should look rather like the one below. It is shaped like an inverted funnel. Hence it is called funnel-plotting.
The dots form a triangle
rather like an inverted funnel.
If a line is dropped down from the highest point, the distribution of the points should be fairly symmetrical on each side. If most or all the smaller studies seem to show a greater effect than the bigger studies, this suggests that there is selective publication. Very often the graph may look like this:
In reality the dots may be
rather more skewed, like this
This shows a very uneven distribution with most of the smaller trials showing far more positive results than the bigger ones. The technique of cut and fill or trim and fill involves first removing the small trials. Then, from the bigger ones the mean (average) is calculated and the small trials are then replaced. However, with them is replaced the “missing” trials which would be the mirror image of the smaller trials, at the same height above the x-axis and the same distance from the mean but on the opposite side from it. Now we do have a symmetrical funnel shape. This may sound rather like making up your own results but it is accepted and validated practice. 24Funnel plotting
After correction, the heavy dots are the original and the circles are the “missing” trials
This technique will be shown again in the section on Homeopathy. A publication in the Lancet in 1997 was a meta-analysis of methodologically sound publications about the effectiveness of homeopathy and it concluded that homeopathy works.25Are the clinical effects of homeopathy placebo effects? A meta-analysis of placebo-controlled trials. However, in 2001, a paper in the British Medical Journal (BMJ) criticised that paper and using funnel plotting with cut and fill showed that homeopathy does not work. 26Systematic reviews in health care: Investigating and dealing with publication and other biases in meta-analysis Negative results had not been published.
Unpublished results are a very important source of bias that results in inappropriate recommendations for how to treat patients. The Cochrane Collaborative has recommended that whenever more than 10 trials are examined, that there should be funnel plotting. 27The evolution of assessing bias in Cochrane systematic reviews
Outside of Medicine
RCTs are the gold standard for assessing if medical treatments work but they can be used in other ways too. Here is one that someone might like to try.
Astrology has been around for millennia. Every day, week or month magazines or newspapers print horoscopes, but are they accurate? “They must be,” some will say, “as they have been around for so long.” Longevity is not the same as effectiveness. Blood-letting and purging were used for many centuries and even had a theoretical base. The theory was wrong. We can test to see if astrology has any validity.
If it is soundly based I would expect to find that astrologers say similar things about the same star sign at any time. I am not convinced that this is true but I am not an avid reader of horoscopes. Around 1960, when supersonic airliners were first being considered, the British, French, Americans and Soviets all revealed their designs at an air show. All were a revolutionary new shape but all were remarkably similar. Immediately accusations of industrial espionage started to fly. They had all put similar data into similar design programmes in their computers and so they produced similar results. Look at the similarity of small hatchback cars. For this reason I would like all subjects in the trial to abstain from reading other horoscopes during the trial.
We would need a large number of subjects who are prepared to take part in the trial and not read other horoscopes for the duration. Of course the number needed would have to be calculated as shown above. At a guess, I suggest that it may well be at least 200 subjects. Each week, all subjects could be given a horoscope that may or may not be theirs, and they do not know. They are asked to rate it over the next week for accuracy. They then have to tick one answer from the following question:
How well do you think this prediction applied to your week?
- Very well
- Fairly well
- Neither well nor poorly
- Fairly poorly
- Very poorly
This enables the prediction to be scored from 1 to 5. In case horoscopes do not have a sudden cut-off as we move from one day to the next and one star sign to the next, the subjects with star signs adjacent to the actual one should be ignored. The average score is taken for those whose star sign it was and also for the rest. This means that for every one whose star sign it was, we would expect nine controls as there are twelve signs of the zodiac and two are ignored. Eventually the data are collected, and a statistical analysis done. The question is: Are the scores for those who received the correct horoscope significantly different from those who received the wrong horoscope?
Would anyone like to take this on? If it is well done I would suggest approaching New Scientist or Nature for publication.
- Sibbald B, Roland M. Understanding controlled trials: Why are randomised controlled trials important? BMJ 316 : 201 (Published 17 January 1998). [full text]http://www.bmj.com/content/316/7126/201.full
Written for doctors and some may find it rather heavy going
- Sterne JAC, Egger M, Smith GD.; Systematic reviews in health care: Investigating and dealing with publication and other biases in meta-analysis; BMJ, Jul 2001; 323: 101 – 105 [full text] http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1120714
Written for doctors but in a style that is comparatively easy to comprehend
- Ben Goldacre. Bad Science. Fourth Estate, London 2009. Chapter 14. Bad Stats
The chapter on bad statistics or misuse of statistics in an excellent book
- Ben Goldacre Bad Pharma. Fourth Estate, London 2012. Chapter 1 Missing Data This goes into rather more detail about how data goes missing to give unduly flattering results for trails. It gives rather more on selective publication but does not mention funnel-plotting.
- Clinical Trials NHS
Another excellent resource from the NHS, explaining clinical trials, including how to get involved.
- Craft AW. The first randomised controlled trial. Arch Dis Child1998;79:410 doi:10.1136/adc.79.5.410.
- Chalmers I, Clarke M. Commentary: the 1944 patulin trial: the first properly controlled multicentre trial conducted under the aegis of the British Medical Research Council. Int J Epidemiol. 2004 Apr;33(2):253-60.
- Online Etymological Dictionary.
- Friedman JH, Dubinsky R. The placebo effect. DOI 10.1212/01.wnl.0000326599.25633.bb Neurology 2008;71;e25.
- Moustgaard H, Clayton GL, Jones HE, Boutron I, Jørgensen L, Laursen DLT et al. Impact of blinding on estimated treatment effects in randomised clinical trials: meta-epidemiological study. BMJ. 2020 Jan 21;368:l6802 [full text]
- Anand R, Norrie J, Bradley JM, McAuley DF, Clarke M. Fool’s gold? Why blinded trials are not always best. BMJ 2020;368:l6228
- Lichtenberg P. The role of placebo in clinical practice. Mcgill J Med. 2008 November; 11(2): 215–216. [full text]
- Lapierre YD. Ethics and placebo. J Psychiatry Neurosci. 1998 January; 23(1): 9–11.
- Bandolier. Why bogus therapies seem to work.
- Benedetti F, Amanzio M, Vighetti S, Asteggiano G. The Biochemical and Neuroendocrine Bases of the Hyperalgesic Nocebo Effect. The Journal of Neuroscience, November 15, 2006 • 26(46):12014 –12022 [full text]
- Gospel according to Luke. Chapter 18 verse 42
- Gospel according to Mark. Chapter 6 verses 1-6.
- Steossi AJ. Expectation and Dopamine Release: Mechanism of the Placebo Effect in Parkinson’s Disease. Science 10 August 2001: Vol. 293 no. 5532 pp. 1164-1166 DOI: 10.1126/science.1060937 [full text]
- Meissner K, Kohls N, Colloca L. Introduction to placebo effects in medicine: mechanisms and clinical implications. doi: 10.1098/rstb.2010.0414 Phil. Trans. R. Soc. B 2011 366, 1783-1789. [full text]
- Kaptchuk TJ, Miller FG. Open label placebo: can honestly prescribed placebos evoke meaningful therapeutic benefits? BMJ. 2018 Oct 2;363:k3889 [full text]
- Mc Cambridge J, Witton J, Elbourne DR. Systematic review of the Hawthorne effect: New concepts are needed to study research participation effects. J Clin Epidemiol. 2014 Mar; 67(3): 267–277 [full text]
- Olson CM, Rennie D, Cook D, Dickersin K, Flanagin A, Hogan JW Publication bias in editorial decision making. JAMA. 2002 Jun 5;287(21):2825-8.
- Dickersin K, Min YI, Meinert CL. Factors influencing publication of research results. Follow-up of applications submitted to two institutional review boards. JAMA 1992 Jan 15;267(3):374-8.
- Leung DY, French JK. End points in clinical trials: are they moving the goalposts? Heart. 2006 Jul;92(7):870-2. [full text]
- Medicines and Healthcare Regulatory Agency website. http://www.mhra.gov.uk/Aboutus/index.htm
- The Thalidomide Story, Science Museum.
- Dwan K, Altman DG, Arnaiz JA, Bloom J, Chan AW Cronin E, et al. Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS One. 2008 Aug 28;3(8):e3081. [full text]
- Cronin E, Sheldon T. Factors influencing the publication of health research. Int J Technol Assess Health Care. 2004 Summer;20(3):351-5.
- Duval S, Tweedie R. Trim and fill: A simple funnel-plot-based method of testing and adjusting for publication bias in meta-analysis. Biometrics. 2000 Jun;56(2):455-63.
- Linde K, Clausius N, Ramirez G, et al; Are the clinical effects of homeopathy placebo effects? A meta-analysis of placebo-controlled trials.; Lancet. 1997 Sep 20;350(9081):834-43.
- Sterne JAC, Egger M, Smith GD.; Systematic reviews in health care: Investigating and dealing with publication and other biases in meta-analysis; BMJ, Jul 2001; 323: 101 – 105 [full text]
- Turner L, Boutron I, Hróbjartsson A, Altman DG, David Moher D The evolution of assessing bias in Cochrane systematic reviews of interventions: celebrating methodological contributions of the Cochrane Collaboration. Syst Rev. 2013; 2: 79.
This website is now completed, although I shall continue to do updates. The following list shows the sections or chapters. Just click on the topic in blue to go to that part of the site.