Reading and understanding scientific papers

As a healthcare professional, it is important to engage with scientific literature and interpret their findings in the light of your everyday interactions with patients. In this short piece, you’ll find some simple tips on how to read and understand scientific papers.

At first glance, research articles can be quite overwhelming. The depth of information often requires you to read the article numerous times, critique the results, and understand the findings in the context of the wider scientific questions. However, reading papers often, and in the right way, is a valuable way to expand your knowledge of evidence-based healthcare.
Here, we will examine two well known clinical studies in the field of haemophilia, and consider some key questions to consider when interpreting any research papers:
  • Manco-Johnson et al (2007). Prophylaxis versus Episodic Treatment to Prevent Joint Disease in Boys with Severe Hemophilia. NEJM; 357: 535 – 544.
  • Peyvandi et al (2016). A Randomized Trial of Factor VIII and Neutralizing Antibodies in Hemophilia A. NEJM; 374: 2054 – 2064.
Its probably worth familiarising yourself with one or both articles before we begin.

1: What is the research question?

The introduction section in a paper introduces the background and history of the subject in more detail. The authors establish key themes in order to highlight gaps in the overall scientific knowledge, and what they will do to address this. Here, it is good to consider their research question and why they are asking it.

Manco-Johnson et al (2007) begin by highlighting that heamarthrosis is a painful condition in children with haemophilia A, leading to the development of joint damage. Prophylactic treatment of patients with blood-derived factor VIII were established in the 1960s but curtailed in the 1980s due to contamination issues. Although safer recombinant forms of factor VIII are now available, whether prophylaxis slows the onset of joint damage, and the best regimen if this is so, have never been established.

Peyvandi et al (2016) begin with the statistic that 30% of patients receiving factor VIII develop inhibitory antibodies, which in turn reduce the protein’s efficacy. The source of factor VIII (plasma-derived or recombinant) may be an important factor in this, but previous studies have been inconclusive. Possible explanations for the differences include the carrier protein Von Willebrand factor in plasma-derived products, additional plasma proteins and/or differences in protein translation in human-derived treatments. The study aims to compare the development of inhibitory antibodies in patients receiving recombinant or plasma—derived factor VIII.

2: What is the domain?

Who are the authors specifically addressing in their article? The journal in which they are publishing may be a good indication. But also consider the issues they highlight and the assumed level of knowledge.

Clearly, both authors are addressing clinically based healthcare professionals with an in-depth knowledge of haemophilia care. Both papers have implications for professionals with an influence over prescriptions and treatment design for young children.

3: What is the study base?

In order to standardise the starting point for therapeutic intervention, and ensure that volunteers are a representative sample of the larger patient population, clinical trials have stringent recruitment criteria. The size of the study is also important, since larger samples yield results with a greater level of statistical confidence. In both examples here, the study base is clearly important given the research questions.

Manco-Johnson et al (2007) recruit children under 30 months with factor VIII levels below a set threshold (2 U/dL), little or no history of index joint haemorrhages, undetectable inhibitory antibody levels, normal platelet counts and healthy joint movement. Since the study aims to investigate which treatment is better at preventing the onset of joint damage, it is important that the patients don’t have this condition at the start of the trial.

Peyvandi et al (2016) required male children below 6 with severe haemophilia A, no previous treatment with factor VIII or blood-derived products, and no existing levels of inhibitory antibodies. From here, we can now see if inhibitors develop in these children, and whether recombinant or plasma-derived factor VIII induces this to a greater extent.

4: How is the study designed, and what are the determinants?

Consider how the authors propose to distribute the patients within the study, and measure and compare the outcomes to see any differences. Are they using the same measurement strategies as are used with the patients that you work with?

In both papers, the patients are randomized when allocated their treatment. Considering the study base, we know that the patients all present a similar disease state at the start of study, so it is possible to randomly assign their treatment without producing a bias in the results.

Additionally, a very specific minimum number of patients are required so that a statistically significant difference can be detected. This is considered at the design stage of the trial, and a certain excess number is recruited to allow for dropouts during the trial.

Manco-Johnson et al (2007) required 64 patients in order to detect a difference in the incidence of bone or cartilage damage. They recruited on the assumptions that they would lose 10% for early joint damage, 7% for developing high titer inhibitors, 7% for haemorrhages and 10% during follow up. The researchers are also blinded; i.e. they don’t know which treatment the patients have been receiving. This is important given the qualitative nature of this type of assessment, where knowing that a patient is receiving a certain treatment could bias their assessment.

Peyvandi et al (2016) required a much larger sample size of 270 patients in order to detect the differences they hypothesized in the incidence of inhibitor generation; measured as a ‘hazard ratio’. They recruited just over 300 patients on the assumption that 10% would drop out. In this study, the data is collected in a much more objective way in the form of laboratory assays. For this reason, blinding of researchers is not required.

5 – What are the primary and secondary outcomes?

The measured endpoints in a clinical study can be considered ‘primary’ and ‘secondary’. The primary endpoints address the main question, which is established at the beginning. The secondary endpoints address important additional factors, which aren’t necessarily related to the main research question, but are nonetheless important. Examples include; biomarker levels, severity of side-effects, treatment costs and hospital admissions.

Consider also the statistical methods which the authors use to compare their endpoints. Although an in-depth knowledge of statistics isn’t required to understand the implications of the results, a general grasp on the key concepts are very useful.

Manco-Johnson et al (2007) measure the preservation of index joint structure as their primary outcome. They want to know which treatment strategy is the most effective, and for this they use Fisher’s exact test. A summary of their findings can be found in Table 2 on page 540 of their paper. This method compares the 2 possible outcomes (’joint damage’ or ‘no joint damage’) between the 2 treatment groups and generates a ‘p value’ based on this comparison. This value indicates the probability of the difference between the 2 groups being purely due to chance, so a low p value suggests that the differences between groups is due to the treatment received. Generally, a p value of 0.05 or less is considered ‘significant’.

The number of infusions and total units of factor VIII administered are considered the secondary endpoints. These don’t address the initial question, but are important when considering the relative benefits of both treatment regimens from a cost and patient care perspective. They assess these outcomes using the t-test and Mann-Whitney U test. The differences between these two tests is beyond the scope of this article, but in summary, they are both methods to compare the measured outcomes for both groups and indicate if the difference can be considered statistically significant. Again, a p-value is reported for each comparison as a measure of this.

Peyvandi et al (2016) want to know which kind of factor VIII generates higher occurrences of inhibitory antibodies as their primary outcome. They define this as a level of inhibitors above a measured threshold (0.4 Bethesda units). They assess their data using a visualisation of the development of inhibitors over time, called a Kaplan-Meier plot (Figure 2, page 2061). From this, a ‘hazard ratio’ can be determined, which indicates how frequently this outcome occurs in one treatment group compared to the other.

For their secondary outcome, the incidence of the development of ‘high-titer’ inhibitors (>5 Bethesda units) is examined. Again, the two treatment groups are compared with a Kaplan-Meier plot and a calculated hazard ratio. Notably, the authors also consider putative confounding variables (e.g. race, age and family history), and make ‘adjusted analyses’ of the data to see if these affect the outcomes.

6: What is the follow up time, and is there any loss of follow up?

The specific follow up included in a trial brings up a number of questions. Over what length of time have any benefits been shown, and does this relate to your standard practices? A loss of follow up in one treatment group may be the result of poor compliance. If volunteers on a trial comply poorly with a new treatment, will this be an issue for your patients? A difference in compliance between groups can also introduce a bias, especially if certain subgroups within the patient population are more likely to drop out than others.

Manco-Johnson et al (2007) follow their patients until age 6, and assess whether joint damage has occurred by this point. Within their study period, they report reasonably high levels of compliance with regards to data generation and return of data forms (specific percentages discussed on page 539).

Peyvandi et al (2016) follow patients for 50 exposure days, or 3 years, or until inhibitor levels reach a specific level. They then follow the patients for 6 months after this point. Although their trial was terminated early, resulting in the loss of 10 patients, a reasonably high percentage of patients (86%) complete the trial with no notable differences in drop out between treatment groups.

7: What are the conclusions, and are they reasonable?

Towards the end of a paper, the authors refer back to their initial question and discuss how their results relate to the broader scientific issues. Are their conclusions reasonable and as expected based on the data for the primary and secondary outcomes? Consider also what isn’t answered by the data, and what further studies could be performed to address this. Consider also implications such as costs and methods of monitoring. Is the level of monitoring in the trial practical in the clinic?

Manco-Johnson et al (2007) show that prophylaxis is effective in preventing index-joint damage in patients who have had up to five haemorrhages. Unlike previous studies, their patient criteria are a combination of age and medical history, and they suggest that this may be more relevant considering the pathogenesis of the disease. Regarding their secondary outcomes, although prophylactic treatment is demonstrably more efficacious, the number infusions and cumulative cost of this regimen make it very expensive. Perhaps also worthy of consideration, although not explicitly mentioned, is the development of inhibitory antibodies in 2/30 prophylaxis patients.

Peyvandi et al (2016) discuss how plasma-derived factor VIII was associated with a lower risk of inhibitory antibody development than the recombinant alternative. The development of inhibitors for recombinant treatment was even higher than the researchers initially expected, and they comment that this may be due in part to their increased testing frequency compared to clinical practice and previous studies. Regarding their secondary outcomes; although recombinant factor VIII appears to increase the risk of high titer development, an insufficient amount of data meant that they could not establish this statistically. Although they initially highlight the various hypotheses regarding the biochemistry behind these differences, their results don’t directly address this.


Regularly reading scientific papers allows one to maintain an up-to-date knowledge of treatment options and implications. By focussing your reading on these questions, you can extract the right information and start to put the latest findings into practice.