While research can inform massage therapists about what they do, how can they tell the difference between good research and bad ones?

Does massage therapy actually lower cortisol level in our body? If you search on Google Scholar or PubMed, you might find numerous studies that say, “Yes!” However, if you dig a little deeper and examine the literature on a broader scale, it actually says, “Not really.”

Psychologist Christopher Moyer broke down the conflicting research about massage therapy’s effect on cortisol levels at the Registered Massage Therapists’ Association of British Columbia’s Mental Health Symposium in 2017. One one hand, some research—primarily done at the Touch Research Institute (TRI) in Florida—found that a single dose or multiple doses of massage can reduce cortisol significantly (1,2) while Dr. Moyer’s research team found “very little, if at all.” (3)

Moyer et al. reviewed 19 qualified studies with a total of 704 participants (614 adults), the data points stretched across the chart. Dots with long black horizontal lines that cross the vertical dotted line (the mean) tell us how strong or weak the effect of the treatments were. This brought some “Hmmm” and slight smiles among the registered massage therapists (RMTs) at the conference.

Massage research is still quite low in quantity and quality compared to physical therapy and nursing, and it still runs into many problems that other healthcare professions face. As Moyer had presented, massage therapists should not take what they read for its face value, and they should critically appraise research within and without our field. So what should therapists consider when they read research, especially in manual therapy?

How Massage Research Effects Practice and Patient Care

An editorial published in the British Journal of Sports Medicine (BJSM) suggested why physical therapists need to adopt a strong and honest evidence-based practice to avoid misleading the public and to draw attention to interventions that are effective and efficacious. The authors gave an example where patients with acute low back pain should get physical therapy first as the first line of defense, but this statement from the American Physical Therapy Association (APTA) is not backed up by sound evidence. In fact, they cited that such endorsements ignored “high-quality evidence that early physical therapy is more costly and provides no benefit over usual care of [low back pain],” and “low-quality evidence that early physical therapy likely results in higher rates of opioids, imaging, injections, and surgery compared with no physical therapy.” (4)

Misleading the public can deepen the stereotypes and promote a negative criticism of our profession in their eyes. Therefore, understanding research and watching out for false claims can initially improve what we market and say to our clients and patients. However, with so many flaws and conflicting information in the scientific process—never mind fake news and clickbaits—where do we start to disseminate what is good, poor, and everything in between?

“I recently heard a man interviewed on my local public radio station complain about the difficulty of keeping up with what he called the ‘swerves of scientific wisdom’: ‘I spent two tours in Iraq as a gunner,’ he said, ‘and I know how hard it is to hit a moving target. I wish these scientific experts would just hold still.’

“But that’s the thing. Holding still is exactly what science won’t do.”

~ David P. Barash, evolution biologist and psychology professor, Aeon (5)


1. Prior plausibility in massage research

If your mom told you that a great white shark had attacked her boat while she went fishing, showed you the damage to the hull, and you know the waters where she fished is home to sharks, then you would probably believe her. But if she told you that a two-headed sea serpent had attacked her and showed you the same evidence, you would probably ask for more evidence.

This is basically what prior plausibility implies: how valid is a claim based on the existing body of knowledge? In that example, do two-headed sea serpents exist given what we know about marine biology and paleontology?

Our understanding of the natural world and the universe comes from several centuries of observations and experiments that evolved in small increments. Even though these changes can sometimes overturn established facts (e.g. number of planets in the solar system, primates do not have empathy or use tools), new theories are still based on the fundamental principles of physics, chemistry, biology, and other natural sciences. Thus, healthcare professionals should not coin up an idea out of thin air and establish it as a fact (and sell it!) if it contradicts the existing body of knowledge. To do so, we would need an immense amount of data that overturns the existing knowledge.

“The amount of evidence necessary to add one small bit of incremental understanding about a phenomenon is much less (and should be less) than the amount of evidence necessary to entirely overturn a well-established theory,” neurologist Steven Novella had explained on his blog when someone criticized the application of prior plausibility in medicine. (6)

“Another way to look at it is this: the amount of evidence necessary to overturn an established conclusion is proportional to the amount of evidence that established the conclusion in the place. Failing to consider prior probability would also mean overturning a mountain of prior evidence with one tiny new bit of evidence. This amounts to favoring the new evidence simply because it is the most recent, but there is no justification for this approach.” (6)

2. Methodology in massage research: most important yet ignored

acupuncture research sham

Photo: Max Pexels https://www.maxpixel.net/Acupuncture-Treatment-Physical-Therapy-Therapy-1698832

We tend to pay more attention to the conclusions and headlines of the research paper than how it is done (methodology) and what the researchers found (results). That is probably because they are quite dry and foreign to us when we see those numbers, symbols, and abbreviations in those parenthesis (EX: One-way ANOVA: F(1, 20) = 8.496, p<.0035, ηp2 = . 198). If we do not have a strong background of statistics and methods, our minds may glaze over the text while our brain enters a screensaver mode, overlooking data that could tell us more about the validity of the experiment.

“The Introduction and Discussion sections are subjective and could be very well be biased to make the study look favorable and important,” Dr. Anoop Balachandran said in an online interview with Massage & Fitness Magazine, who is the Assistant Professor of Exercise Science at Queens College at the City University of New York in New York City. “Remember that the methods and the results section are the only objective sections of a study.”

He suggests that clinicians follow the “PICO” of the study, which stands for population, intervention, control, and outcomes. Balachandran describes PICO as follows:

“P stands for population. It tells us what the population is used in the study. For example, it could be people with chronic pain, acute pain or hip or knee osteoarthritis, older adults, males and so forth. Research is very population specific and an intervention that works for acute pain or in older adults may not work well in younger or who have chronic pain. So, define the population clearly.

“I stands for intervention. This could be manual therapy or pain education or exercise and so forth. It is also important to understand the specifics of the intervention. For example, if an exercise intervention, specifics—such as how many days/week, intensity or dosage, duration, and so forth—should be noted. The specifics of an intervention can make a big difference.

“For example, a 10-week intervention could have different outcomes compared to a similar intervention but with a longer duration of one year.

“C means control or comparator group. This tells us what researchers are comparing the intervention against. People always talk about the intervention but ignore the control group. For example, a new exercise intervention could look spectacular if my control group is just a sedentary control, but it will look mediocre if I compare it to group that is given a standard exercise protocol.

“O stands for outcomes: What are their main outcomes? The primary outcome for manual therapists’ studies are pain or/and physical function. What people care about is if they feel better or move better. For example, improving ROM (range of motion) is usually an outcome, but this outcome has no meaning in a person’s life whatsoever. This outcome only becomes meaningful if the improved [range of motion] helps with tasks that were previously impossible, like washing hair or wearing a jacket. So outcomes do matter.

“Once you have identified the PICO, you have managed to understand the bulk of the study in a very short time.”

Regardless of which profession the research is focused on how well researchers do on methodology applies to all randomized controlled trials (RCTs), Balachandran explained. Some of these factors are:

1. Did the participants have an equal chance of falling in either of the groups ( were they randomized?);
2. Could the participant or investigator have predicted which groups they will fall into (concealed allocation);
3. Were the participants and investigator blind to the treatment (double-blind);
4. Did they include all the participants in their final analysis (intention to treat)?

“In my opinion, the most important methodological concern for manual therapy studies is blinding,” Balachandran said. “Did the participants knew if they were getting the real or the sham treatment? Ideally, we want the sham treatment to look and feel exactly the same as the intervention minus the part that is supposedly giving the benefits of the intervention.

One example he gave is studies on acupuncture and its effect on certain pain relief. When such trials compare actual acupuncture versus fake acupuncture, where the needle retracts into a sheathe after the needle contacts the skin, both types show almost the same effects. (7)

“Also, in good methodological studies, they also report if the blinding worked or not by asking the participants after the study.” Balachandran said. “Of course, the doctor is aware that he is giving the sham or the real treatment so there is a high risk of bias. So there is some bias that cannot be completely eliminated even in the most rigorous studies. In short, studies with good methodology should have a well-validated sham treatment and also report if the blinding worked based on the feedback from the participants.”

Balachandran also considered dropouts in a study that can affect its bias. “This is one aspect that most folks ignore or are not aware of. It is now pretty clear that dropouts in a study could be related to the prognosis. In manual therapy studies, this simply means that people who drop out from a study might be the ones who didn’t experience any benefits or may have worsened their pain. So the people who remain in the intervention group are the ones in which the treatment worked well and naturally the group averages tend to be better than it truly is. This becomes more important especially when one group has greater dropouts than the other group.

“We will never know if the results of a study is the actual truth, but the better the methods, the more confident we are that we are closer to the truth. After all, science is all about finding the truth.”

3. Does the effect size of massage research tell us whether a treatment is worth paying for?

effect size cohen's d research

The close the two comparisons, the less differences there are. Image: Skbkekas

In research, an effect size tells us how much difference there are between two treatment groups, and knowing it can help us recognize how effective and efficacious the treatment is. If a research paper says that patients who were treated with XYZ intervention had less pain than those who had received no intervention or had a different type of treatment, how big was the difference? After all, technically, researchers could claim a tiny difference between groups can be “better,” in which some people can market that intervention as “effective.”

Effect size, also known as Cohen’s d, is measured on a scale where 0.8 is ranked as “high,” 0.5 as “moderate,” and 0.2 as “low.” Back to the Moyer’s 2011 review on the effect of massage on cortisol levels, the researchers pooled 11 studies on single-dose massage sessions as the first treatment in a series of treatments with a total of 460 participants and found that “[massage therapy] did not reduce cortisol significantly more than control treatments” with the effect size (d) of 0.15. When they examined eight studies of single-dose sessions with 307 participants as the last treatment of a series, they also found the same result (d = 0.15).

Among multiple-dose massage sessions, they also found low effect sizes (d = 0.12) among 16 studies with a total of 598 participants. When these studies are split into adults and children, the adults (n = 508) effect size dropped more than 50 percent (d = 0.05). However, the children group ranked higher with a “moderate” d = 0.052. The last result may seem promising, but Moyer et al. cautioned that this data is based on a tiny number of studies and likely is “vulnerable to the file-drawer threat” where many studies with negative outcomes are not published.

Also, larger effect size does not always mean a treatment is “better,” nor does a smaller effect size mean it is “not very good.” You need to consider context of the data to making better meaning. For example, if being active regularly can improve mortality by 0.09 over five to six years, this small effect can still save many lives by encouraging people to be more active. But if a treatment only reduces back pain by 0.09 when compared to standard care or placebo, then it may not be in the best interest for patients to get and pay for that treatment.

4. Clinical equipoise: How much bias are there in the study?

In some ways, manual therapy research is similar to mixed martial arts where one modality is “pitted” against another to see which is “better.” This bias not only may lead them the everything-is-a-nail mentality, it also can skew the outcomes of manual therapy research. As with any scientific research, the goal is to be as objective as possible, hence the emphasis of clinical equipoise, which is the assumption that one intervention is NOT better than another.

Physiotherapists Chad Cook and Charles Sheets wrote that to have true equipoise, the researchers must have “no preference or is truly uncertain about the overall benefit or harm offered by the treatment to his/her patient. In other words, the clinician has no personal preconceived preferences toward the ability of one or more of the interventions to have a better outcome than another.” (8)

While this may not seem very practical or even realistic in actual research, it is something researchers should strive for. Even with the best intentions, researchers sometimes unconsciously set up the experiment in a way that favors the tested intervention or undermines the control group. It is also possible that the “placement of importance, enthusiasm, or confidence associated with one’ expertise in an intervention” can influence the experiment’s outcome, which also includes how well the patients perceive the treatment. (8)

Having pure personal and clinical equipoise may be unlikely to avoid to do because it is impossible to blind the practitioner. It would be like working blind-folded while providing the treatment, which is not practical.

5. Is the massage research basic or translational? How well could it be applied to hands-on work?

san diego pain summit massage research

Applying therapeutic exercise research to practice at the 2017 San Diego Pain Summit with Ben Cormack. Photo: Nick Ng

You might have read something about researchers using fruit flies to understand how genetics, behavior, and certain diseases work. Back in the late 1970s to early 1980s, Dr. Christiane Nüsslein-Vollhard, who is a biologist, made some discoveries about how fruit flies grow and develop with her colleagues Dr. Eric Wieschaus and Dr. Edward B. Lewis. She had no idea that their research on Drosophilas could end up being one of the main staple animals that many biomedical research use in the next three decades.

“We started out in our research with a deep interest in understanding the origin and development of pattern during embryogenesis. None of us expected that our work would be so successful or that our findings would ever have relevance to medicine,” Nüsslein-Vollhard said in 1995 when she received her Nobel prize. “But it has now become apparent that at least some of the fundamental principles we learned from the fly apply to higher vertebrates as well, including humans.” (9)

What she, Wieschaus, and Lewis did was basic science, research done for the sake of understanding a specific phenomenon with no or little consideration for its real-life applications. What came after was translational science, using fruit flies for a variety of scientific disciplines, including evolutionary biology and behaviorial genetics that could identify certain genes that increase the likelihood for humans to get a certain disease.

However, in manual therapy research, we should be careful about extrapolating basic research to our hands-on practice since the researchers may not often have practicality of their work in mind. Conversely, sometimes translational research in manual therapy do not consider basic science and may come up with false positives.

“There is a tendency to not scrutinize basic mechanisms of research through the same critical lens that we do clinical trials,” Dr. Neil O’Connell explained in an email interview with Massage & Fitness Magazine. “But this is a mistake as the field is troubled by small sample sizes and avoidable biases that are not controlled.” He cited a paper by Professor Andrew Rice who highlighted this problem in pre-clinical pain research and potential solutions in Pain in 2016.

“Beyond bias and quality, folks often take a biological phenomena, perhaps observed in a cell or animal model, and use it as evidence that a particular therapy is effective. That evidence may be directly related to the mechanistic claims of the treatment, but it often is rather more indirect. They will often presume that any evidence of some physiological change is evidence that a treatment is plausible. You see this lots with acupuncture brain imaging studies in humans.”

O’Connell cited a study in 2011 that finds sticking needles in certain parts of the forearm can change how the brain perceives pain. However, such changes does not mean the interventions “work.”

“Brain activation in response to needling neither validates acupuncture nor provides a cogent mechanism for therapeutic action. That it activates sub-cortical networks involved ‘endogenous pain modulation’ also does not distinguish acupuncture as a possibly effective active treatment since these networks are also implicated in the placebo response,” O’Connell explained in Body in Mind.

“Also, as I have often repeated, given that our best evidence clearly indicates that it really doesn’t matter what you do with those needles, the sensory qualities of, or cortical responses to acupuncture seem unlikely candidates for a mechanism of action. It is not that the data in this study tells us nothing important. It adds to an existing body of data regarding how sensory input is cortically processed. But as a study of therapeutic mechanisms it attributes possible mechanisms to an effect that the best evidence tells us does not exist.” (10)

“Some how the finding that some physiological effect exists is taken to legitimize the intervention,” O’Connell said. “Of course, there is the age-old problem of how far experimental models—particularly in animals but also in healthy humans—can really tell us about the experience of clinical pain, stripped as they are of the context of threat, fear, and real suffering and consequences.

“This kind of research is used to build biological plausibility, and that is really valuable. But we tend to forget that even when a biologically plausible mechanism exists, it is not a given that it will translate into a meaningful clinical effect. Think of the scale of failed new pharmaceuticals where the biological plausibility work all looked good. None of this is to denigrate the value of various types of evidence. The danger lies in taking any evidence, not matter how removed from the patient, as some form of tacit legitimization of your therapy du jour.

“Beyond bias and quality, folks often take a biological phenomena, perhaps observed in a cell or animal model, and use it as evidence that a particular therapy is effective. That evidence may be directly related to the mechanistic claims of the treatment, but it often is rather more indirect. They will often presume that any evidence of some physiological change is evidence that a treatment is plausible.”

6. Do positive outcomes validate a treatment?

Tying back to outcomes, when we see our patients feel better after a treatment, we often give credit to our treatment for making them feel better. However, this alone cannot be a validation to the positive (or sometimes negative) outcomes because outcomes only measures outcomes, not the effects of the intervention. In other words, outcomes are not reliable factors that indicate our hands-on intervention is the sole or primary cause of pain relief, improved mobility, and other outcomes.

This is because many factors other than the intervention can affect the outcome, such as the non-verbal interactions between the therapist and the client or patient, the natural regression of pain, and patients’ beliefs and prior experience to the intervention and interaction with their previous therapist. Sometimes a positive outcome can happen without any interventions; and we can also say the same thing about negative outcomes which doesn’t always mean that the intervention “sucks.” It could have gotten worse without the intervention. (11)

Relying on outcomes can also blind us from acknowledging those who do not respond well to the treatment, which is a demonstration of survivorship bias—we notice and acknowledge successful outcomes while sweeping aside unsuccessful ones. Even though high quality randomized-controlled trials are the best method to use, Herbet et al. suggest that we should avoid the being extreme on either side of the decision-making process, where one side relies on outcomes and the other side relies strictly on randomized controlled trials.

They wrote, “We think that, when there is evidence of effects of intervention from high quality clinical trials, a sensible approach to clinical decision-making lies somewhere between the two extremes of the fully empirical approach and a hard-line approach in which clinical decision-making is based only on high quality clinical research without regard to outcome. In this approach, extreme clinical observations (very good or very poor outcomes) are considered likely to be ‘real’ (bias is unlikely to have qualitatively altered the clinical picture), so they are used to guide clinical decision-making.

“However, the qualitative interpretation of typical observations (small improvements in outcome) could plausibly be altered by bias, so they are essentially ignored. In other words, this approach suggests that clinical decision making should be influenced by observations of very good and very poor outcomes, but should not be influenced by less extreme observations.” (11)

When there is no good available evidence, clinical outcomes are likely all we get to use. That could be risky, depending on the nature of the problem and the type of intervention, because low-quality evidence or the lack of it can misguide us.

However, when randomized-controlled trials “provide clear evidence of the effects of an intervention from high quality clinical trials, clinical outcome measures become relatively unimportant and measures of the process of care become more useful. When evidence of effects of interventions is strong, we should audit the process of care to see if it is consistent with what the evidence suggests is good practice. When there is little or no evidence (i.e. when practice cannot be evidence-based) we should audit clinical outcomes.” (11)

When it comes to considering whether an intervention works or does not work, we should take our time to examine the existing evidence and check our biases.

“Science is a way of trying not to fool yourself. The principle is that you must not fool yourself, and you are the easiest person to fool.” ~ Richard Feynman

While it is quite overwhelming to see the huge complexity and interactions among different factors that make up a research paper, it gets easier when we get familiar with recognizing its strengths and weaknesses, like building endurance for a marathon. We can avoid being misled by news headlines and blogs that may misinterpret or exaggerate a study. Also, we can appreciate the larger body of literature that gives us a better picture of a topic, such as whether massage therapy actually does decrease cortisol or not.



1. Field, T.M., 1998. Massage therapy effects. American Psychologist. 53, 1270e1281.

2. Field, T., Hernandez-Reif, M., Diego, M., Schanberg, S., Kuhn, C. Cortisol decreases and serotonin and dopamine increase following massage therapy. International Journal of Neuroscience. 2005. 115, 1397e1413.

3. Moyer CA, Seefeldt L, Mann ES, Jackley LM. Does massage therapy reduce cortisol? A comprehensive quantitative review. 2011. J Bodyw Mov Ther. Jan;15(1):3-14. doi: 10.1016/j.jbmt.2010.06.001. Epub 2010 Jul 2.

4. Zadro JR, O’Keeffe M, Maher CG. Evidence-based physiotherapy needs evidence-based marketing. Br J Sports Med. Published Online First: 27 October 2018. doi: 10.1136/bjsports-2018-099749.

5. Barash D. Science Needs the Freedom to Constantly Change Its Mind. Aeon. 27 October, 2015.

6. Novella S. In Defense of Prior Probability. Neurologica. 2015, 12 May.

7. Smith CA, de Lacey S, Chapman M, Ratcliffe J, Norman RJ, Johnson NP, Boothroyd C, Fahey P. Effect of Acupuncture vs Sham Acupuncture on Live Births Among Women Undergoing In Vitro Fertilization: A Randomized Clinical Trial. A Randomized Clinical Trial. JAMA. 2018 May 15;319(19):1990-1998. doi: 10.1001/jama.2018.5336.

8. Cook C, Sheets C. Clinical equipoise and personal equipoise: two necessary ingredients for reducing bias in manual therapy trials. J Man Manip Ther. 2011;19(1):55-7.

9. Christiane Nüsslein-Volhard, Banquet speech. NobelPrize.org.

10. O’Connell N. Breaking Research News – The sensory cortex is not a brick. Body in Mind. 2011, 23 Aug.

11. Herbert R, Jamtvedt G, Mead J, Hagen KB. Outcome measures measure outcomes, not effects of intervention. Aust J Physiother. 2005;51(1):3-4.

Website | + posts

A native of San Diego for nearly 40 years, Nick Ng is an editor of Massage & Fitness Magazine, an online publication for manual therapists and the public who want to explore the science behind touch, pain, and exercise, and how to apply that in their hands-on practice or daily lives.

An alumni from San Diego State University with a B.A. in Graphic Communications, Nick also completed his massage therapy training at International Professional School of Bodywork in San Diego in 2014.

When he is not writing or reading, you would likely find him weightlifting at the gym, salsa dancing, or exploring new areas to walk and eat around Southern California.