Statistics for non-statisticians

Data measured in the clinic is often complex and multi-faceted, and as such, medical statistics is a large and diverse field to which one could dedicate an entire career. Although an in-depth knowledge of every type of statistical analysis is not necessary for those working with patients every day, an understanding of some of the key concepts is very beneficial for healthcare professionals.

Medical statistics allows us to make informed predictions based upon past observations. When prescribing a drug for a patient, very rarely has this treatment been tested in this person before, so we don’t know for certain how they will respond. However, from data generated by systematically testing treatments on similar patients in the past, we can make an informed prediction as to the best therapy for this individual.

Medical data can be thought of as either qualitative or quantitative. Qualitative data is easily categorised and often not numerical, and examples include things like hospital ID and disease state (e.g. ‘Severe’ or ‘Moderate’). Although this data is useful for interpreting clinical research, this article will focus on quantitative data; that is, data involving numerical values measured for a population on a continuous scale, such as blood pressure or enzyme activity.

Common descriptive statistics: Measuring the ‘central tendency’

When measurements are made on an entire group of patients, it is not normally feasible to describe the detailed results of every individual. In most cases, researchers will give an indication of the ‘typical’ value or measurement, which is representative of the entire groups’ data. This value is called the ‘measure of central tendency’ or ‘average’, and two common examples of this are the mean and median.

Mean: The mean of a data set is calculated by adding all of the values together and dividing by the total number of samples, often called the n number. Take the following example dataset, representing the diastolic blood pressure for a group of patients:

Patient Diastolic blood pressure (mmHg)
1 85
2 92
3 73
4 81
5 80
6 95
7 67
8 79

The mean diastolic blood pressure is calculated by adding all of the values (85 + 92 + 73 + 81 + 80 + 95 + 67 + 79 = 652) and dividing by the n number (in this case, 8). So, 652 ÷ 8 = 81.5.

Median: The median is calculated by selecting the middle number when all of the numbers within a dataset are listed in numerical order. For an odd number of samples you simply take the middle number, but if there is an even number of samples then you take the mean of the 2 central numbers within the series.

For the example data above, listing the values numerically gives 67, 73, 79, 80, 81, 85, 92, 95 so we take the mean of 80 and 81, giving a median value of 80.5.

Depending on the dataset being analysed, either the mean or median can be considered the most ‘typical’ value. But to choose the most appropriate measure of central tendency, it is vital to consider how your data is distributed around this ‘typical’ value, in what is called the distribution.

 

Data distribution

Data sets can follow a variety of patterns, but this article will focus on the two most common; normal and skewed distributions.

The bell-shaped curve of normal distribution: Consider the following graph showing the percentage reduction in pain for a group of 52 patients taking ‘Drug A’; an experimental medicine.

The data set is quite large, so to get an idea of how the data is spread out, we can count the number of samples within each range of values (e.g. ‘0 – 4%’, ‘5 – 9%’ etc), and plot these frequencies as a histogram.

When visualised like this, the data follows a symmetrical pattern often referred to as a ‘bell-shaped curve’. The values near the highest point on the curve can be considered more ‘typical’ of the population, and as you move away from the centre towards either ‘tail’ of the curve, values become more infrequent.

When the mean and median are plotted you can see that they are almost identical. Although both would be an acceptable measure of central tendency, generally the mean is reported.

Although the mean on its own can tell you some information about this population, it is also useful to consider how spread out the data is around this central point. This is where the standard deviation is useful.

Imagine that another drug, ‘Drug B’, has also been tested on a separate group of 52 patients.

Below, we have plotted the data for Drug B as a histogram alongside Drug A for comparison.Although the mean reduction in inflammation is the same for both groups, the data for Drug B is less spread out around this value. To quantify this, the ‘standard deviation’ is calculated by finding the average of the squared differences from the mean (the values are squared so that negative and positive differences don’t cancel each other out).

Although the mean reduction in inflammation is the same for both groups, the data for Drug B is less spread out around this value. To quantify this, the ‘standard deviation’ is calculated by finding the average of the squared differences from the mean (the values are squared so that negative and positive differences don’t cancel each other out).

Normally, the standard deviation is reported alongside the mean with a ± sign:

Drug Mean ± SD
A 48.1 ± 17.5
B 48.1 ± 9.6

As a general rule for normally distributed data, 68% of the data falls within one standard deviation and 95% of the data falls within two standard deviations from the mean. We can see this pattern quite clearly if we plot the standard deviations on to the normal distribution curve.

As such, the standard deviation gives an idea of the range within the data where you can expect most of the values to be found.

The skewed distribution of asymmetric data

The following graph shows the number of hospital admissions for a group of 68 patients on a clinical trial. Again, data has been grouped into a series of distinct ranges covering all values, and the frequencies plotted as a histogram.

You can see that rather than being distributed symmetrically, the data is skewed in one direction. Although most patients were admitted to hospital between 3 and 12 times, some patients were admitted many more times than this. Since the tail extends to the right of the peak, we would say that ‘the data is skewed to the right’.

Now consider where the mean and median sit within this data set. Although both values can be calculated, because of the values extending to the right of the peak of the data, the mean is pulled in this direction.

For this reason, the median is the most appropriate ‘typical’ value. An equal number of values within the data sit either side of this central number, and it isn’t affected by extremely high or low values in the same way that the mean is.

Although other types of distribution do sometimes occur, most data sets in medicine will conform to either a normal or skewed distribution.

Hypothesis testing – asking more in-depth questions of the data

It is useful to visualise and describe the distribution of data around a central point. However, often we need to scrutinise the data even further to address our research questions. Not only do we need to describe the trends within groups, but we also need to compare groups with each other to detect differences.  In clinical trials for example, different treatments are compared to see which ones are more efficacious. To properly detect and describe these differences, we need hypothesis testing.

A multitude of statistical tests occur in medical research, and the decision as to which is the most appropriate depends on the type of data being tested and the questions being asked of it. Nonetheless, the principles behind hypothesis testing are the same for each kind of test, and can be thought of as a three step process.

Step 1: Establish the null hypothesis

The first step which a researcher takes is to establish the null and alternative hypotheses. When reading medical literature, you can often consider this yourself with regards to the research questions and primary and secondary outcomes of a trial.

Generally, two opposing hypotheses are considered. Once established, the data can then be tested to see which one is the best description. Usually, these hypotheses are:

The null hypothesis: This is normally the ‘status quo’ option, i.e. that the groups are not significantly different to each other. For example in a haemophilia trial, the null hypothesis may be “Drug A does not reduce occurrences of haemarthrosis more than Drug B”

The alternative hypothesis: This is normally what the researchers are trying to find evidence for, such as “Drug A has reduced occurrences of haemarthrosis more than Drug B”

Step 2: Decide on an appropriate test

Once the hypotheses have been established, an appropriate statistical test will be performed to test them. The specific tests used within a study will depend upon a number of factors such as; in which direction an expected difference would occur, how many groups are being compared, and importantly, how the data is distributed. The distribution is particularly important, as the statistical tests make ‘assumptions’ about the data’s distribution around its central value when comparing the groups.

Statistical tests can broadly be considered as one of two kinds of test depending on the data’s distribution around its typical value. These tests are:

Parametric tests which assume that the data sets being compared follow a bell-shaped curve distribution, and can be described by a mean and standard deviation. Common examples include the t-test and the analysis of variance (ANOVA).

Non-parametric tests don’t require the data to conform to a normal distribution, and are normally used to compare data with a skewed distribution. Common examples include the Mann-Whitney U test.

Step 3: Generate a ‘p-value’ and reject or accept the null hypothesis

Although the nuances of each of these tests vary, essentially they ask one central question of the data: How likely is it that the difference between the groups is due to chance alone? This is where the p-value comes in.

The p-value is generated by a statistical test, and is a measure of the probability that the null hypothesis is true. It is expressed as a number between 0 and 1, where a low value indicates a high probability of a genuine difference, and values approaching 1 would indicate that there is no difference between the groups being compared.

A p-value of 0.05 or less is generally considered to be statistically ‘significant’ in research. This value would indicate that there is a 5% chance that the differences observed between the groups is due solely to random chance, so there is a 95% probability that the differences observed are due to the experimental conditions. In this case, we would reject the null hypothesis and accept the alternative hypothesis.

An example study

To put these concepts into practice, imagine a trial where the patients are given experimental drugs to control joint inflammation. 70 patients are given ‘Treatment A’ or ‘Treatment B’ for 6 months, and the percentage reduction in inflammation measured as the primary endpoint.

On the right, we have calculated the frequency of values occurring between intervals of 5. From this, we can visualise both data sets as a histogram:

Both data sets follow a symmetrical bell-shape curve, so we can describe them in terms of their mean and standard deviation:

Treatment Mean ± SD
A 35 ± 23.9
B 49 ± 15.8

From this information, it would appear that there is a difference in outcome between the two groups. However, we still don’t know if this effect is significant or merely occurring by chance. This is where the statistical test is used.

Step 1: Identify the null hypothesis

Since we are comparing the clinical outcome between two types of treatment, we could consider the following as our hypotheses:

Null hypothesis: There is no difference in reduction in inflammation between the two types of treatment.

Alternative hypothesis: There is a significant difference in the reduction in inflammation induced by the two treatments.

Step 2: Decide on an appropriate test

Since we are comparing two data sets following a symmetrical distribution around the mean, we can use a parametric test to scrutinise the data. A common technique for comparing two means is the Student’s two-tailed t-test. This test assumes that the difference can occur in either direction (i.e. we haven’t hypothesised the direction in which a difference will occur).

Step 3: Generate a p-value and accept/reject the null hypothesis

Statistical analysis of the data using this test yields a p-value of 0.007. Since this p-value is less than 0.05, we can reject the null hypothesis and accept the alternative hypothesis that there is a significant difference between the responses to these two treatments. In the literature, this finding could be written as

“In our trial, Treatment B induced a significantly greater reduction in joint inflammation than Treatment A (Student’s t-test, p = 0.007)”

A graph of the data showing the mean, standard deviation (as error bars) and p-value could also be included:

Conclusion

Having a general understanding of the concepts behind data distribution and statistical testing is useful when interpreting clinical data. Understanding how data is tested to interpret trial findings helps one to appreciate the initial study design, and evaluate the significance of the results.